LNCS 2932 




3frtti Conference on Curran 
In Theory and Practice of£ 
K$Hn» Czech Kepwhlir^lan: 



Lecture Notes in Computer Science 2932 

Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 




Springer 

Berlin 
Heidelberg 
New York 
Hong Kong 
London 
Milan 
Paris 
Tokyo 




Peter Van Emde Boas Jaroslav Pokorny 

V 

Maria Bielikova Julius Stuller (Eds.) 



SOFSEM 2004: 
Theory and Practice 
of Computer Science 



30th Conference on Current Trends 
in Theory and Practice of Computer Science 
Meffn, Czech Republic, January 24-30, 2004 
Proceedings 




Springer 




Series Editors 

Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utrecht University, The Netherlands 

Volume Editors 
Peter Van Emde Boas 

University of Amsterdam, Faculty of Sciences 
ILLC - Department of Mathematics and Computer Science 
Plantage Muidergracht 24, 1018 TV Amsterdam, The Netherlands 
E-mail: peter@science.uva.nl 

Jaroslav Pokorny 

Charles University, Faculty of Mathematics and Physics 
Malostranske nam. 25, 1 18 00 Prague 1, Czech Republic 
E-mail: pokorny@ksi.ms.mff.cuni.cz 

Maria Bielikova 

Slovak University of Technology 
Faculty of Informatics and Information Technologies 
Ilkovicova 3, 812 19 Bratislava, Slovak Republic 
E-mail: bielik@elf.stuba.sk 

Julius Stuller 

Academy of Sciences of the Czech Republic, Institute of Computer Science 
Pod Vodarenskou vezi 2, 182 07 Prague 8, Czech Republic 
E-mail: stuller@cs.cas.cz 

Cataloging-in-Publication Data applied for 

A catalog record for this book is available from the Library of Congress. 

Bibliographic information published by Die Deutsche Bibliothek 

Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; 

detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. 

CR Subject Classification (1998): F.l, F.2, F.3, H.2, 1.2, H.3, H.4, C.2 
ISSN 0302-9743 

ISBN 3-540-20779-1 Springer- Verlag Berlin Heidelberg New York 

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

Springer- Verlag is a part of Springer Science+Business Media 

springeronline.com 

(c) Springer-Verlag Berlin Heidelberg 2004 
Printed in Germany 

Typesetting: Camera-ready by author, data conversion by PTP-Berlin, Protago-TeX-Production GmbH 
Printed on acid-free paper SPIN: 10979303 06/3142 5 4 3 2 1 0 




Preface 



The 30th Anniversary Conference on Current Trends in Theory and Practice 
of Computer Science, SOFSEM 2004, took place during January 24-30, 2004, 
in the Hotel VZ Merin, located about 60 km south of Prague on the right shore 
of Slapska prehrada (“Slapy Dam”) in the Czech Republic. 

Having transformed itself over the years from a local event to a fully internatio- 
nal conference, the contemporary SOFSEM tries to keep the best of its winter 
school aspects (the high number of invited talks) together with multidiscipli- 
narity trends in computer science - this year illustrated by the selection of the 
following 4 tracks: 

— Computer Science Theory (Track Chair: Peter Van Emde Boas) 

— Database Technologies (Track Chair: Jaroslav Pokorny) 

— Cognitive Technologies (Track Chair: Peter Sincak) 

— Web Technologies (Track Chair: Julius Stuller) 

Its aim was, as always, to promote cooperation among professionals from aca- 
demia and industry working in various areas of computer science. 

The 17 SOFSEM 2004 Program Committee members coming from 9 countries 
evaluated a record 136 submissions. After a careful review process (counting 
usually at least 3 reviews per paper), followed by detailed discussions at the PC 
meeting held on October 2-3, 2003 in Prague, Czech Republic, 59 papers were 
selected for presentation at the SOFSEM 2004: 

— 22 contributed talks papers selected by the SOFSEM 2004 PC for publication 
in the Springer- Verlag LNCS proceedings volume (acceptance rate 19%), 
including the best paper from the Student Research Forum, 

— 29 contributed talks papers that will appear in the MatFyzPress Proceedings 
(acceptance rate 26%), 

— 9 student research forum papers (acceptance rate 38%), 8 of which will 
appear in the MatFyzPress proceedings. 

The Springer- Verlag proceedings were completed by the 10 invited talks papers. 

SOFSEM 2004 was the result of considerable effort by a number of people. It is 
our pleasure to express our thanks to: 

— the SOFSEM Steering Committee for its general guidance, 

— the SOFSEM 2004 Program Committee and additional referees who devoted 
an extraordinary effort to reviewing a huge number of assigned papers (on 
average about 24 papers per PC member), 

— the Springer- Verlag LNCS Executive Editor Mr. Alfred Hofmann for his 
continuing trust in SOFSEM, 

— Springer- Verlag for publishing the proceedings, and 

— the SOFSEM 2004 Organizing Committee for a smooth preparation of the 
conference. 




VI 



Preface 



Special thanks go to: 

— Hana Bflkova from the Institute of Computer Science (ICS), Prague, who 
did an excellent job in the completion of the proceedings, 

— Miclral Busta and Martin Starecek from ICS for realizing the SOFSEM 2004 
web pages and a submission and review system that worked perfectly thus 
allowing a smooth PC session in Prague. 

Finally we highly appreciate the financial support of our sponsors (ERCIM, 
Microsoft, Deloitte & Touche, SOFTEC Bratislava, Centrum.cz) which assisted 
with the invited speakers and helped the organizers to offer lower student fees. 



November 11, 2003 Peter Van Emde Boas 

Jaroslav Pokorny 
Maria Bielikova 
Julius Stuller 




VII 



S — Advisory Board 



Dines Bj0rner 
Peter Van Emde Boas 
Manfred Broy 
Michal Clrytil 
Georg Gottlob 
Keith G. Jeffery 
Maria Zemankova 



Technical University of Denmark, Lyngby, DK 

University of Amsterdam, NL 

Technical University Munich, DE 

ANIMA Prague, CZ 

Vienna University of Technology, AT 

CLRC RAL, Chilton, Didcot, Oxon, UK 

NSF, Washington, DC, USA 



: ^ Steering Committee 



Branislav Rovan, Chair 
Miroslav Bartosek 
Maria Bielikova 
Keith G. Jeffery 
Antonin Kucera 
Julius Stuller 
Gerard Tel 
Petr Tuma 

Jan Staudek, Observer 
Jin Wiedermann, Observer 



Comenius University, Bratislava, SK 

Masaryk University, Brno, CZ 

Slovak University of Technology in Bratislava, SK 

CLRC RAL, Chilton, Didcot, Oxon, UK 

Masaryk University, Brno, CZ 

Institute of Computer Science, Prague, CZ 

Utrecht University, NL 

Charles University in Prague, CZ 

Masaryk University, Brno, CZ 

Institute of Computer Science, Prague, CZ 



Program Committee 



Peter Van Emde Boas, Chair 
Jaroslav Pokorny, Co-chair 
Peter Sincak, Co-chair 
Julius Stuller, Co- chair 
Witold Abramowicz 
Martin Beran 
Maria Bielikova 
Slruji Hashimoto 
Juraj Hromkovic 
Leonid Kalinichenko 
Vladimir Kvasnicka 
Roman Neruda 
Dimitris Plexousakis 
Michael Schroder 
Vaclav Snasel 
Gerard Tel 
Bernhard Thalheim 



University of Amsterdam, NL 

Charles University, Prague, CZ 

Technical University, Kosice, SK 

Institute of Computer Science, Prague, CZ 

Poznan University of Economics, PL 

Charles University, Prague, CZ 

Slovak University of Technology, Bratislava, SK 

Waseda University/Humanoid Institute, JP 

RWTH University, Aachen, D 

Institute for Problems of Informatics, R.U 

Slovak University of Technology, Bratislava, SK 

Institute of Computer Science, Prague, CZ 

University of Crete, GR 

City University, London, UK 

VSB-TU Ostrava, CZ 

Utrecht University, NL 

Brandenburg Technical University, Cottbus, D 




VIII SOFSEM 2004 Committees 



Additional Referees 



Eduardo Alonso 


Joachim Kupke 


Vangelis Angelakis 


Rasto Lencses 


Grigoris Antoniou 


Peter Lennartz 


Yannis Askoxylakis 


Maarten Marx 


Krzysytof Banaskiewicz 


Vladimir Mend 


Andrzej Bassara 


Daniel Moody 


David Bednarek 


Tshiamo Motshegwa 


Hans-Joachim Boeckenlrauer 


Manfred Nagl 


Hans L. Bodlaender 


Athanasis Nikolaos 


Dirk Bongartz 


Petr Pajas 


Peter A.N. Bosman 


Demos Panagopoulos 


Harry M. Buhrman 


Nikos Papadopoulos 


Lubomfr Bulej 


Stefan Porubsky 


Bernadette Clrarron-Bost 


Jiff Pospichal 


David Coufal 


Anthony Savidis 


Panos Dafas 


Ralf Schweimeier 


Martin Doerr 


Sebastian Seibert 


Gunar Fiedler 


George Serfiotis 


Miroslav Galbavy 


Manolis Spanakis 


Christos Georgis 


Branislav Steinmuller 


Jurgen Giesl 


Yannis Stylianou 


Jacek Gomoluch 


Ioannis Tollis 


Jurriaan Hage 


Vojtech Toman 


Pavel Hlousek 


Nikolaos M. Tsatsakis 


Martin Holena 


Petr Tfima 


Tomasz Kaczmarek 


Walter Unger 


Ivan Kapustik 


Marinus Veldlrorst 


Stamatis Karvounarakis 


Peter Verbaan 


George Kokkinidis 


Krzysztof Wecel 


Manolis Koubarakis 


Marek Wisniewski 


Marek Kowalkiewicz 


Jakub Yaghob 


Stano Krajci 


Pawel Zebrowski 


Jaroslav Krai 


Stanislav Zak 


Ivan Kramosil 


Michal Zemlicka 


Kiriakos Kritikos 






Organization 



S — Organization 

The 30th Anniversary SOFSEM 2004 was organized by 

Institute of Computer Science, Academy of Sciences of the Czech Republic, 
Prague 

Charles University, Faculty of Mathematics and Physics, Prague 
Faculty of Informatics, Masaryk University, Brno 
Institute of Computer Science, Masaryk University, Brno 
Czech Society for Computer Science 

in co-operation with the Slovak Society for Computer Science 



S — Organizing Committee 



Julius Stuller, Chair 
Hana Bilkova 
Martina Brodska 
Michal Busta 
Zuzana Hajkova 
Martin Starecek 
Milena Zeithamlova 



Institute of Computer Science, Prague, CZ 
Institute of Computer Science, Prague, CZ 
Action M Agency, Prague, CZ 
Institute of Computer Science, Prague, CZ 
Action M Agency, Prague, CZ 
Institute of Computer Science, Prague, CZ 
Action M Agency, Prague, CZ 



S — Sponsoring Institutions 

ERCIM 
Microsoft 
Deloitte & Touche 
SOFTEC Bratislava 
Centrum.cz 




Table of Contents 



Invited Talks 

Games, Theory and Applications 1 

H.J. van den Herik, H.H.L.M. Donkers 

Database Research Issues in a WWW and GRIDs World 9 

K.G. Jeffery 

Integration, Diffusion, and Merging in Information Management 

Discipline 22 

V. Kumar 

Flexibility through Multiagent Systems: Solution or Illusion? 41 

P.C. Lockemann, J. Nimis 

World Wide Web Challenges: 

Supporting Users in Search and Navigation 57 

N. Milic- Fray ling 

Querying and Viewing the Semantic Web: 

An RDF-Based Perspective 60 

D. Plexousakis 

Knowledge Acquisition and Processing: 

New Methods for Neuro-Fuzzy Systems 62 

D. Rutkowska 

Algorithms for Scalable Storage Servers 82 

P. Sanders 

Fuzzy Unification and Argumentation for Well-Founded Semantics 102 

R. Schweimeier, M. Schroeder 

Tree Signatures and Unordered XML Pattern Matching 122 

P. Zezula, F. Mandreoli, R. Martoglia 

Regular Papers 

Quantum Query Complexity for Some Graph Problems 140 

A. Berzina, A. Dubrovsky, R. Freivalds, L. Lace, O. Scegulnaja 

A Model of Versioned Web Sites 151 

M. Bielikova, I. Noris 




XII 



Table of Contents 



Design of Secure Multicast Models for Mobile Services 163 

E. Blessing, R., R. Uthariaraj, V. 

Some Notes on the Complexity of Protein Similarity Search under 

mRNA Structure Constraints 174 

D. Bongartz 

Measures of Intrinsic Hardness for Constraint Satisfaction 

Problem Instances 184 

G. Boukeas, C. Halatsis, V. Zissimopoulos, P. Stamatopoulos 

Validity Conditions in Agreement Problems and Time Complexity 196 

B. Charron-Bost, F. Le Fessant 

Supporting Evolution in Workflow Definition Languages 208 

S. M. Fernandes, J. Cachopo, A.R. Silva 

Clustered Level Planarity 218 

M. Forster, C. Bachmaier 

Artificial Perception: Auditory Decomposition of Mixtures of 
Environmental Sounds - Combining Information Theoretical 
and Supervised Pattern Recognition Approaches 229 

L. Janku 

Features of Neighbors Spaces 241 

M. Jirina, M. Jirina, Jr. 

Discovery of Lexical Entries for Non-taxonomic Relations in 

Ontology Learning 249 

M. Kavalec, A. Maedche, V. Svatek 

Approaches Based on Markovian Architectural Bias in Recurrent 

Neural Networks 257 

M. Makula, M. Gernansky, L. Benuskova 

Processing XPath Expressions in Relational Databases 265 

T. Pankowski 

An Embedded Language Approach to Router Specification 

in Curry 277 

J.G. Ramos, J. Silva, G. Vidal 

Multi-document Automatic Text Summarization 

Using Entropy Estimates 289 

G. Ravindra, N. Balakrishnan, K.R. Ramakrishnan 

Implicit Flow Maximization by Iterative Squaring 301 

D. Sawitzki 




Table of Contents XIII 



Evolving Constructors for Infinitely Growing Sorting Networks 

and Medians 314 

L. Sekanina 

Fuzzy Group Models for Adaptation in Cooperative Information 

Retrieval Contexts 324 

M. -A. Sicilia, E. Garcia 

Theory of One Tape Linear Time Turing Machines 335 

K. Tadaki, T. Yamakami, J.C.H. Lin 

Avoiding Forbidden Submatrices by Row Deletions 349 

S. Wernicke, J. Alber, J. Gramm, J. Guo, R. Niedermeier 

Building a Bridge between Mirror Neurons and Theory 

of Embodied Cognition 361 

J. Wiedermann 

The Best Student Paper 

Fully Truthful Mechanisms 373 

N. Chen, H. Zhu 

Author Index 385 




Games, Theory and Applications 



H.J. van den Herik and H.H.L.M. Donkers 

Institute for Knowledge and Agent Technology (IK AT), 
Department of Computer Science, Universiteit Maastricht 
P.0. Box 616, 6200 MD, Maastricht, The Netherlands. 

{herik ,donkers}@cs .unimaas .nl 



Abstract. Computer game-playing is a challenging topic in artificial 
intelligence. The recent results by the computer programs Deep Blue 
(1996, 1997) and Deep Junior (2002) against Kasparov show the power 
of current game-tree search algorithms in Chess. This success is owed 
to the fruitful combination of the theoretical development of algorithms 
and their practical application. As an example of the theoretical develop- 
ment we discuss a game-tree algorithm called Opponent-Model search. 
In contrast to most current algorithms, this algorithm uses an opponent 
model to predict the opponent’s moves and uses these predictions to lure 
the opponent into uncomfortable positions. We concentrate on the time 
complexity of two different implementations of the algorithm and show 
how these are derived. Moreover, we discuss some possible dangers when 
applying Opponent-Model search in practice. 



1 Games 

From the very beginning, game-playing has been studied in Artificial Intelligence. 
In [1] an overview is given showing that research in this domain has led to 
a variety of successes. Examples of computer programs that defeated the best 
human players occurred in Chess, Checkers, Draughts, and Othello. Still, there 
are many additional challenges in this area and in domains of other games. One 
of them is the application of knowledge of the opponent’s strategy. 

The idea of anticipating the opponent’s strategy is not new. As a simple 
example (from [2]), we consider playing TicTacToe by the following ordered 
strategy S: 

1. If completing three-in-a-row is possible, do so. 

2. If the opponent threatens completing three-in-a-row, prevent this. 

3. Occupy the central square whenever possible. 

4. Occupy a corner square whenever possible. 

TicTacToe is known to be drawn, and it might be questioned whether knowledge 
of one’s opponent strategy could improve on this result. Intuitively, it seems clear 
that S should achieve a draw since it correctly evaluates the squares and acts 
on this evaluation. Yet, a program aware of the opponent’s strategy S may win. 
Allow the program the first move as X, the following sequence of moves then 
causes player X to win, where at move 2 and 4 player O follows S. 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 1-8, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




2 



H.J. van den Herik and H.H.L.M. Donkers 















X 




X 


X 




X 








0 




0 




0 




0 




X 




X 




X 




— Y 




0 


X 




77 



move 1 move 2 move 3 move 4 move 5 

The win by X is due to X’s awareness of the opponent’s strategy S, admittedly 
non-optimal, or to rephrase this statement, due to X’s successful prediction of 
O’s moves. 



2 Opponent-Model Search 

OM search [3], [4], [5] is a game-tree search algorithm that uses a player’s hypo- 
thesized model of the opponent in order to exploit weak points in the opponent’s 
search strategy. The opponent model may be correct, but more frequently it 
may have some small errors. Therefore, it can be a possible help as well as a 
hindrance in playing the opponent. The OM-searclr algorithm is based on three 
strong assumptions concerning the opponent and the player: 

1. the opponent (called min) uses minimax (or an equivalent algorithm) with 
an evaluation function ( V op ), a search depth, and a move ordering that are 
all three known to the first player (called max); 

2. MAX uses an evaluation function (Vo) that is better than min’s evaluation 
function; 

3. max searches at least as deep as min. 

Obviously, OM search is still closely related to minimax. In OM search, MAX 
maximizes at max nodes, and selects at min nodes the moves that MAX thinks 
MIN would select. 

Below we provide a brief technical description of OM search, its notation 
and the relations between the nodes in the search tree. Moreover, we mention 
a few enhancements to which adequate references are made. For an extensive 
description of OM search we refer to [6], [7]. 

OM search can be described by the following equations, in which V 0 (-) ,V op (-) 
are the evaluation functions, and Vq(-), v op (-) are the node values. Subscript ‘0’ 
is used for MAX values (it is not strictly necessary, but used to balance with the 
subscript ‘op’), subscript ‘op’ is used for MIN values. 





|* maxj vo (Pj) 


if P is a max node, 


vo(P) = 


1 vo(Pj), j = min arg; v op {Pi ) 


if P is a min node, 


1 


l V 0 (P) 


if P is a leaf node. 


1 


1 * maxj v op (Pj) 


if P is a max node, 


v op (P) = \ 


miiij v op {Pj) 


if P is a min node, 




{ V op {P ) 


if P is a leaf node. 



If P is a min node at a depth larger than the search-tree depth of the opponent, 
then vq (P) = min, vo(Pj). 




Games, Theory and Applications 



3 



3 Algorithms 

The equations (1) in the previous section do not prescribe how Opponent-Model 
search should be implemented efficiently. An important aspect of game-tree 
search algorithms is pruning , which means that portions of the tree are disre- 
garded because those parts cannot influence the result. The more an algorithm is 
able to prune from a tree, the faster a tree is searched. /3-pruning OM search [4] 
is an improvement of plain OM search that is able to prune in max nodes (not 
in min nodes, hence the name ’/^’-pruning) and still yields the same result. The 
pruning is analogous to a-/3 pruning: when for one of the children Pj of a non- 
leaf max node P the value of v op (Pj) is higher than the resultant value of the 
already evaluated siblings of the max node, then the remaining children of the 
max node can be pruned. 

We present in Fig. 1 two implementations of /3-pruning OM search: a one- 
pass version that visits nodes only once (denoted by OM^ lp ) and a version with 
so-called ct-/3 probes that uses ct-/3 search to predict min’s moves (denoted by 
OM.P pb ). For a detailed explanation of these algorithms we refer to [7]. 



Algorithm OM ^ lp (P,/3): 


Algorithm OM 0pb (P,/3): 


if (P is leaf node) return (Vo(P), V op (P), 0) 


if (P is leaf node) return (Vo(P),0) 


if (P is max node) 


if (P is max node) 


« — 00 ; v o P < — 00 


Vq < 00 


for (all children Pj of P) 


for (all children Pj of P) 


if (Pj is leaf node) 


(v 0 ,k) <- OM ^ pb (P jt /3) 


v op «— Vop(Pj); if ( v op < (3) v 0 <r- Vo(Pj) 


if (v 0 > Vo) Vo <- v 0 ; j* <- j 


else (vo, v op , k) <— OM^ lp (Pj , (3) 


if (P is min node) 


if (v 0 > Vq) v* <- v 0 ‘, j* <- j 


( v op>0 *) •*- 01-/3 search(P, -oo ,/3, V op (-)) 


if (v op > v * p ) v* p v op 


(v*,k)^OM‘ )Pb (P j ,,v* op + l) 


if ( v* p > (3 ) break 
if (P is min node) 

K P <- 0 

for (all children Pj of P) 
if (Pj is leaf node) v op <— V op (Pj ) 
else (vo ,v op ,k) OM 01p (Pj , v* p ) 

if (v op < v * p ) v* p <- v op ; vq <r- v 0 ; j* «- j 
if (Pj* is leaf node A v* p < (3) Vq <— V 0 (Pj*) 
return (v£,v* pi j*) 


return (vQ,j*) 



Fig. 1 . Two implementations of /3-pruning OM search. The left algorithm is a one-pass 
version and the right algorithm uses a- fit probing 



4 Time-Complexity 

Below we discuss the best-case behaviour of the one-pass version of /3-pruning 
OM search (OM^ lp ) and of the version with a- (3 probes (OM^ pb ). Thereafter a 
comparison is performed. 

Best-case analysis of OM ,31p . The effect of /3-pruning in OM^ lp is depen- 
dent on the ordering of child nodes with respect to v op (-). In uniform trees, the 





4 



H.J. van den Herik and H.H.L.M. Donkers 



pruning is maximal if the tree is well ordered , which means that the child nodes 
hj of all nodes h are sorted on their value of v op {hj). For max nodes the child 
nodes must be ordered in decreasing order and for min nodes in increasing order. 

Proposition 1. Any algorithm that implements (3-pruning OM search will need 
at least the number of evaluations on a well-ordered uniform tree that 0M^ lp 
needs. 

Proof. To prove this proposition, we first determine the minimal number. Then 
we show that OM^ lp uses exactly this number of evaluations in the best case. 

Fig. 2 gives a schematic rep- 
resentation of a well-ordered 
uniform tree to which /3-pruning 
OM search is applied. From this 
figure we derive a formula in 
closed form for the number of 
evaluations at least needed for 
/3-pruning OM search. Assume 
a uniform tree with branching 
factor w and depth d. There are 
w 2 nodes at depth 2 to be ex- 
panded in such a search tree, 
because only branches at max 
nodes can be pruned and no 
pruning takes place at the root, 
different types. The first type of subtrees (type A) are the subtrees on the left- 
most branch of every node at depth 1. These subtrees are of the same type as 
the original tree and their number is w. The other w(w — 1) subtrees (type B) 
only have to be considered for the opponent’s evaluation using a-(3 search with 
window [— oo,/3]. In Fig. 2, one of the type-B subtrees is worked out in detail. 
In this subtree, /3 pruning takes place directly under the root, so only one min 
node remains at the first level. Directly below this min node, no a pruning can 
take place. The subtree at the first max child node of this min node is again 
of type B. The subtrees at the other max child nodes also are of type B. It is 
not possible that a pruning takes place in these subtrees because the maximal 
/3 pruning at all max nodes prohibits the passing-through of a values. 

The minimum number of evaluations Com (i.e. , the best case for any imple- 
mentation of /3-pruning OM search) can now be given in the following recursive 
expression: 





Fig. 2. Theoretically best-case tree for /3-pruning 
OM search 



The subtrees at these ma.v nodes are of two 



C OM (d,w ) = wC OM {d -2,w) + w(w - 1) C' OM (d- 2,w) 

C OM { 1, W ) = 2 w ; C OM { 2, w) = w(w + 1) 

In these formulae, Com ( d , w) stands for the number of evaluations needed at a 
subtree of type A, and C' OM (d 1 w ) for a subtree of type B. In the case that the 
type-A subtree has depth 1, 2w evaluations are needed: vq(.) and v op (.) have to 



Games, Theory and Applications 



5 



be evaluated for all w leaf min nodes. When it has depth 2, w(w + l) evaluations 
are needed: for all w 2 leaf max nodes, v op (.) has to be obtained, but only for 
the most left max node at every min child node, Vq(.) is needed. The value of 
C' OM (d, w ) is given by the next recursive expression: 

C' OM (d, w) = w C' OM (d - 2, w) 

Com{^-> w ) = 1 i Com( 2; w ) = w 

In the case that the type-B subtree has depth 1, only 1 evaluation is needed in 
the best case, since the value of v op (-) for the first child will be greater than /?. 
Hence vo(-) does not have to be obtained. If the type-B subtree has depth 2, w 
evaluations are needed, one for every grandchild of the first child node. Since 
all values for v op (.) will be greater than f3, no value of vo(.) has to be obtained. 
Formula 3 can easily be written in closed form: 

C' OM (d,w) =w w ' 2i (4) 

The equation for Com can also be written in closed form (which can be found 
by applying repeated substitution): 

M/21-i 

C OM (d, w) = kw^ d/2 ^ + (w - 1) ^2 w z C' OM (d- 2i,w) (5) 

i- 1 

(k = 2 if d is odd, k = w + 1 if d is even.) The validity of the closed form can be 
proven by complete induction on d. For d = 1 and d = 2 equation (5) is clearly 
correct: the summation on the right-hand side is zero in both cases. For d > 2 
we first write down equation (5) with parameter d — 2: 

\(d- 2)/2j— 1 

C OM (d-2,w) = fcu; r(d “ 2)/21 + (w-l) w i C' OM (d-2-2i,w) 

i=l 

[d/2] —2 

= kw^t 2 ^- 1 + {w — 1) ^2 w l C' OM (d — 2 — 2i,w) (6) 

i=i 

M/21-1 

= + (w — 1) ^2 wl ~ l C'oMid — 2i, w) 

i= 2 

Substituting equation (6) into (2) results directly in equation (5), which proves 
the correctness of the closed form. The closed form of equation (5) can be reduced 
further by applying equation (4) and canceling out the summation: 

M/21-i 

CoM(d,w) = kw^ + (w — 1) ^2 W l W^ d ~ 21 ^ 2 ^ 

i = 1 

M/21-1 

= kw W2 ^+(w- 1) Y2 wW2i 
i= 1 

= jfet«r<*/21 + ( w - 1 ) ( [d/2 1 - l)u;Ld/2J 



( 7 ) 




H.J. van den Herik and H.H.L.M. Donkers 



The expression k w I" d can be rewritten to w L d / 2 J +1 + w ^ d / 2 ^ , which removes 
the k. This can be used to rewrite the equation to: 

CoM(d,w) = + wA d / 2 J + (w — 1) [d/2] uA d//2 J (8) 

This concludes the first part of the proof. The number of evaluations in the 
best case for OM^ lp appears to be equal to CoM(d,w). The reasoning is as 
follows (Fig. 2 can be used to illustrate this reasoning). The type-A subtrees are 
of the same type as the original tree, just like the theoretical case. This means 
that the overall formula 5 also holds for OM /31p . However, OM^ lp does not 
apply a-/3 search to the type-B subtrees. Fortunately, in the best case optimal 
/ 3 pruning on all internal max nodes and on all leaf nodes can take place. So no 
evaluation of Vo takes place in type-B subtrees. The number of evaluations in 
these type-B subtrees is given by equation 4. Now the theoretical derivation can 
be followed. This proves the proposition. □ 

Best-case analysis of OM /3Pt . The number of leaf nodes that are evaluated 
for max’s evaluation function in OM^ pb depends on the size of the tree, not on 
the ordering of the nodes. For the moment, the a-(3 probes can be disregarded. 
At every max node, all w child nodes are visited and at every min node, exactly 
1 child is visited. This means that there are exactly u/ d / 2 l leaf nodes visited and 
evaluated for MAX. 

The number of a-(3 probes in 
OM search, too, is only depen- 
dent on the size of the tree. At 
every odd ply 2i — 1 (* > 0) , ex- 
actly w 1 probes are performed. 

The a- (3 probes at the first odd 
ply have a (3 parameter of +oo 
and take C a _/ 3 (w : d — 1) evalua- 
tions. All other a- (3 probes have 
a smaller (3 parameter. Fig. 3 il- 
lustrates the best case for these 
a- (3 probes. 

The best case (i.e., the most 
pruning) for an a-f3 probe with 
/ 3 = v+1 on a min node a occurs 
when the values of nodes a . . . 
k are as indicated in Fig. 3. A careful inspection of the figure indicates that in 
the best case, the j3 parameter v + 1 does not have influence on the pruning in 
the tree. The amount of pruning is therefore equal to the best case of a-f3 search 
with an open window on the same tree: w;L d / 2 J + ud d / 2 l — 1. 

As stated above, there are w l probes at every odd ply 2i — 1 that each cost 
C a -p(d — 2 i + l,w) evaluations. Together with the w^ d ^ 2 ^ evaluations for the 
max player, /^-pruning OM search with a-f3 probes in the best case costs: 




Fig. 3. An example to illustrate the best case for 
a-/3 probes. The [a, fi\ windows and the subgame 
values are given next to the nodes 



Games, Theory and Applications 



7 



rd/21 

C 0 MPp b (d, w) = + ^2 wl C a -p(d — 2 i + 1, w) 

2—1 



[d/21 

= t^l + Y, W l (w L(d— 22+l)/2j + w r(d-22+l)/21 _ x) 



w ™ 2 1 + fd/2l( U ;L(d+ 1 )/2J +w r(d+i)/2l ) 



U> rd/21 + 1 — 

1C — 1 



(9) 



A comparison. The next formulae summarize the best-case analyses for 
/3-pruning OM search: 

OM^: 2 1 + w Ld/2J + (w - 1) rd/21 w Ld/2J 

OM^ pb : w^l + \d/ 2l(wL( d+1 )/ 2 J + U ,r(d+ 1 )/ 2 1) _ u ’ W ^~ w 



Despite the closed forms of the functions, their relation is not immediately clear. 
In Fig. 4 these functions are plotted next to each other. All four diagrams show 
the value of the equation above divided by the best case of a-/3 search. In each 
diagram, the lines give the results for the branching factors 20, 16, 12, 8, and 4 
respectively, from top to bottom. Because the behaviour of the functions differs 
considerably for odd and even search depths (see x-axis), we present separate 
diagrams for both cases. On the left we show the best-case complexities of OM^ lp 
and OM^ 1 for even search depths, on the right for odd depths. 




Even depth Even depth 



Odd depth 



Odd depth 



Fig. 4. Best-case results of OM fJlp and OM /3P6 compared 



In all cases the complexity of OM^ lp is smaller than the complexity of 
OM^ pb . Furthermore, the complexity approximates a linear function of the (odd 
or even) search depth for both OM^ lp and OMd pb . It is also approximately li- 
near in the branching factor for both OM /31p and OM^ pb , but only in the case 
of even search depths (cf. the different scaling of the y-axis). 

In contrast to the expectation, it is shown in [7] that in the average-case 
OM /3Pfc appears to be the most efficient of the two. Moreover, in practical imple- 
mentations, this version offers better opportunities for the application of search 
enhancements which increase the efficiency further. 





8 



H.J. van den Herik and H.H.L.M. Donkers 



5 Attractors 

The practical application of Opponent-Model search is bound to several forms 
of risk. Next to the obvious risk, being an ill prediction of the opponent’s 
moves ([7], [8]), there is a subtle risk that is potentially more dangerous. This 
risk is caused by max’s overestimation of positions that MIN judges correctly to 
be favourable for MIN. Such positions can act as an attractor : MAX is eager to 
reach such a position and MIN follows willingly. The larger the overestimation, 
the more MAX will be attracted to the position and the larger the damage will be. 
In [7], [9] a condition on the evaluation functions Vq and V op is formulated, called 
admissibility , that should prevent the occurrence of these attractors. When this 
risk of OM search is neglected, its application is bound to fail [7]. 

6 Conclusion 

In this paper we presented a best-case complexity analysis of two variants of 
/3-pruning OM search as an example of theoretical research in computer game- 
playing. The corresponding practical application is described among others in [7]. 
Surprisingly, practice leads to a different conclusion with respect to the com- 
plexities of the best cases. Moreover, practice prominently shows several forms 
of risk. 



References 

1. Schaeffer, J., Herik, H. J. van den (eds.): Chips Challenging Champions. Games, 
Computers and Artificial Intelligence. Elsevier Science Publishers, Amsterdam, The 
Netherlands (2002) 

2. Herik, H.J. van den: Informatica en het menselijk blikveld. Inaugural Address. 
University of Limburg, Maastricht (in Dutch). (1988) 

3. Carmel, D., Markovitch, S.: Learning models of opponent’s strategies in game play- 
ing. In: Proceedings AAAI Fall Symposion on Games: Planning and Learning, 
Raleigh, NC (1993) 140-147 

4. Iida, H., Uiterwijk, J.W.H.M., Herik, H.J. van den: Opponent-model search. Tech- 
nical Report CS 93-03, Universiteit Maastricht, Maastricht, The Netherlands (1993) 

5. Iida, H., Uiterwijk, J.W.H.M., Herik, H.J. van den, Herschberg, I.S.: Potential 

applications of opponent-model search. Part 1, the domain of applicability. ICC A 
Journal 16 ( 4 ) (1993) 201 208 Part 2, risks and strategies. ICCA Journal, 17 ( 1 ) 
(1994) 10-14. 

6. Carmel, D., Markovitch, S.: Pruning algorithms for multi-model adversary search. 
Artificial Intelligence 99 (1998) 325-255 

7. Donkers, H.H.L.M.: Nosce Hostem - Searching With Opponent Models. PhD thesis, 
Universieit Maastricht, Maastricht, The Netherlands (2003) 

8. Iida, H., Kotani, I., Uiterwijk, J.W.H.M., Herik, H.J. van den: Gains and risks of OM 
Search. In Herik, H.J. van den, Uiterwijk, J.W.H.M., eds.: Advances in Computer 
Chess 8, Maastricht, The Netherlands, Universiteit Maastricht (1997) 153-165 

9. Donkers, H.H.L.M., Uiterwijk, J.W.H.M., Herik, H.J. van den: Admissibility in 
opponent- model search. Information Sciences 154(3—4) (2003) 119-140 




Database Research Issues in a WWW and GRIDs World 



Keith G. Jeffery 

Director, IT and Head, Information Technology Department 
CCLRC Rutherford Appleton Laboratory 
Chilton, Didcot, OXON 0X1 1 OQX UK 
k . g . j ef f ery@rl . ac . uk 

http : //www. itd. clrc . ac . uk/ Per son/K . G . Jeff ery 



Abstract. The WWW has made information update fast and easy, and (through 
search engines such as Google) retrieval fast and easy. The emerging GRIDs 
architecture offers the end-user complete solutions to their simple request 
involving data and information, computation and processing, display and 
distribution. By comparison conventional database systems and their user 
interfaces appear clumsy and difficult. Nonetheless, experience with WWW has 
taught us that fast and easy can also equate with information that is inaccurate, 
imprecise, incomplete and irrelevant. To overcome these problems there is 
intensive research on 'the semantic web' and 'the web of trust'. The GRIDs 
environment is being developed to include Computer Science fundamentals in 
handling data, information and knowledge. The key aspects are representativity 
of the data and information - accuracy, precision, structure (syntax), meaning 
(semantics) - and expressivity of the languages to represent and manipulate the 
data, information and knowledge - syntax, semantics. There are related issues 
of security and trust, of heterogeneity and distribution and of scheduling and 
performance. The key architectural components are metadata, agents and 
brokers. Access to the GRIDs environment will be from ambient computing 
clients; this raises a host of new problems in security and performance and in 
information summarisation and presentation. There remains an exciting active 
research agenda for database technology. 



1 Introduction 

There is an argument that database R&D (research and development) - or more 
generally ISE (Information Systems Engineering) R&D - has not kept pace with the 
user expectations raised by WWW. Tim Berners-Lee threw down the challenge of the 
semantic web and the web of trust [1], However, the GRIDs concept [6] placed 
database R&D (ISE R&D) back in the forefront. The EC (European Commission) has 
argued for the information society, the knowledge society and the ERA (European 
Research Area) - all of which are dependent on database R&D in the ISE sense. 

It is time for the database community (in the widest sense, i.e. the information 
systems engineering community) to take stock of the research challenges and plan 
a campaign to meet them with excellent solutions, not only academically or 
theoretically correct but also well-engineered for end-user acceptance and use. 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 9-21, 2004. 
© Springer-Verlag Berlin Heidelberg 2004 




10 K.G. Jeffery 



2 GRIDs 



2.1 The Idea 

In 1998-1999 the UK Research Council community was proposing future 
programmes for R&D. The author was asked to propose an integrating IT 
architecture [6]. The proposal was based on concepts including distributed computing, 
metacomputing, metadata, middleware, client-server migrating to three-layer 
architectures and knowledge-based assists. The novelty lay in the integration of 
various techniques into one architectural framework. 



2.2 The Requirement 

The UK Research Council community of researchers was facing several IT-based 
problems. Their ambitions for scientific discovery included post-genomic discoveries, 
climate change understanding, oceanographic studies, environmental pollution 
monitoring and modelling, precise materials science, studies of combustion processes, 
advanced engineering, pharmaceutical design, and particle physics data handling and 
simulation. They needed more processor power, more data storage capacity, better 
analysis and visualisation - all supported by easy-to-use tools controlled through an 
intuitive user interface. 



2.3 Architecture Overview 

The architecture proposed consists of three layers (Fig. 1). The computation / data 
grid has supercomputers, large servers, massive data storage facilities and specialised 
devices and facilities (e.g. for VR (Virtual Reality)) all linked by high-speed 
networking and forms the lowest layer. The main functions include compute load 
sharing / algorithm partitioning, resolution of data source addresses, security, 
replication and message rerouting. The information grid is superimposed on the 
computation / data grid and resolves homogeneous access to heterogeneous 
information sources mainly through the use of metadata and middleware. Finally, the 
uppermost layer is the knowledge grid which utilises knowledge discovery in 
database technology to generate knowledge and also allows for representation of 
knowledge through scholarly works, peer-reviewed (publications) and grey literature, 
the latter especially hyperlinked to information and data to sustain the assertions in 
the knowledge. 

The concept is based on the idea of a uniform landscape within the GRIDs domain, 
the complexity of which is masked by easy-to-use interfaces. To this facility are 
connected external appliances - ranging from supercomputers, storage access 
networks, data storage robots, specialised visualisation and VR systems, data sensors 
and detectors (e.g. on satellites) to user client devices such as workstations and PDAs 
(Personal Digital Assistants). The connection between the external appliances and the 
GRIDs domain is through agents, supported by metadata, representing the appliance 




Database Research Issues in a WWW and GRIDs World 



11 




Fig. 1. The 3-Layer GRIDs Architecture 



(and thus continuously available to the GRIDs systems).These representative agents 
handle credentials of the end-user in their current role, appliance characteristics and 
interaction preferences (for both user client appliances and service appliances), 
preference profiles and associated organisational information. These agents interact 
with other agents in the usual way via brokers to locate services and negotiate use. 
The key aspect is that all the agent interaction is based upon available metadata. 



2.4 The GRID 

In 1998 - in parallel with the initial UK thinking on GRIDs - Ian Foster and Carl 
Kesselman published a collection of papers in a book generally known as ‘The GRID 
Bible’ [4], The essential idea is to connect together supercomputers to provide more 
power - the metacomputing technique. However, the major contribution lies in the 
systems and protocols for compute resource scheduling. Additionally, the designers of 
the GRID realised that these linked supercomputers would need fast data feeds so 
developed GRIDFTP. Finally, basic systems for authentication and authorisation are 
described. The GRID has encompassed the use of SRB (Storage Request Broker) 
from SDSC (San Diego Supercomputer Centre) for massive data handling. SRB has 
its proprietary metadata system to assist in locating relevant data resources. It also 
uses LDAP as its directory of resources. The GRID corresponds to the lowest grid 
layer (computation / data layer) of the GRIDs architecture. 









12 K.G. Jeffery 



3 The GRIDs Architecture 



3.1 Introduction 

The idea behind GRIDs is to provide an IT environment that interacts with the user to 
determine the user requirement for service and then satisfies that requirement across 
a heterogeneous environment of data stores, processing power, special facilities for 
display and data collection systems thus making the IT environment appear 
homogeneous to the end-user. 



The GRIDs 
Environment 






USER 
(human or 
another system) 




SOURCE 




RESOURCE 


(data, information. 




(computer, detector. 


software) 




sensor, VR facility) 



Fig. 2. The GRIDs Components 

Referring to Fig. 2, the major components external to the GRIDs environment are: 

a) users: each being a human or another system; 

b) sources: data, information or software 

c) resources: such as computers, sensors, detectors, visualisation or VR (virtual 
reality) facilities 

Each of these three major components is represented continuously and actively 
within the GRIDs environment by: 

1) metadata: which describes the external component and which is changed with 
changes in circumstances through events 

2) an agent: which acts on behalf of the external resource representing it within the 
GRIDs environment. 









Database Research Issues in a WWW and GRIDs World 



13 



As a simple example, the agent could be regarded as the answering service of 
a person’s mobile phone and the metadata as the instructions given to the service such 
as ‘divert to service when busy’ and / or ‘divert to service if unanswered’. 

Finally there is a component which acts as a ‘go between’ between the agents. 
These are brokers which, as software components, act much in the same way as 
human brokers by arranging agreements and deals between agents, by acting 
themselves (or using other agents) to locate sources and resources, to manage data 
integration, to ensure authentication of external components and authorisation of 
rights to use by an authenticated component and to monitor the overall system. 

From this it is clear that they key components are the metadata, the agents and the 
brokers. 



3.2 Metadata 

Metadata is data about data [7], An example might be a product tag attached to a 
product (e.g. a tag attached to a piece of clothing) that is available for sale. The 
metadata on the product tag tells the end-user (human considering purchasing the 
article of clothing) data about the article itself - such as the fibres from which it is 
made, the way it should be cleaned, its size (possibly in different classification 
schemes such as European, British, American) and maybe style, designer and other 
useful data. The metadata tag may be attached directly to the garment, or it may 
appear in a catalogue of clothing articles offered for sale (or, more usually, both). The 
metadata may be used to make a selection of potentially interesting articles of 
clothing before the actual articles are inspected, thus improving convenience. Today 
this concept is widely-used. Much e-commerce is based on B2C (Business to 
Customer) transactions based on an online catalogue (metadata) of goods offered. 
One well-known example is www.amazon.com . 

What is metadata to one application may be data to another. For example, an 
electronic library catalogue card is metadata to a person searching for a book on a 
particular topic, but data to the catalogue system of the library which will be grouping 
books in various ways: by author, classification code, shelf position, title - depending 
on the purpose required. 

It is increasingly accepted that there are several kinds of metadata. The 
classification proposed (Fig. 3) is gaining wide acceptance and is detailed below. 



Schema Metadata. Schema metadata constrains the associated data. It defines the 
intension whereas instances of data are the extension. From the intension a theoretical 
universal extension can be created, constrained only by the intension. Conversely, any 
observed instance should be a subset of the theoretical extension and should obey the 
constraints defined in the intension (schema). One problem with existing schema 
metadata (e.g. schemas for relational DBMS) is that they lack certain intensional 
information that is required [8]. Systems for information retrieval based on, e.g. the 
SGML (Standard Generalised Markup Language) DTD (Document Type Definition) 
experience similar problems. 




14 K.G. Jeffery 



SCHEMA 





It is noticeable that many ad hoc systems for data exchange between systems send 
with the data instances a schema that is richer than that in conventional DBMS - to 
assist the software (and people) handling the exchange to utilise the exchanged data to 
best advantage. 



Navigational Metadata. Navigational metadata provides the pathway or routing to 
the data described by the schema metadata or associative metadata. In the RDF model 
it is a URL (universal resource locator), or more accurately, a URI (Universal 
Resource Identifier). With increasing use of databases to store resources, the most 
common navigational metadata now is a URL with associated query parameters 
embedded in the string to be used by CGI (Common Gateway Interface) software or 
proprietary software for a particular DBMS product or DBMS-Webserver software 
pairing. 

The navigational metadata describes only the physical access path. Naturally, 
associated with a particular URI are other properties such as: 

a) security and privacy (e.g. a password required to access the target of the URI); 

b) access rights and charges (e.g. does one have to pay to access the resource at the 
URI target); 

c) constraints over traversing the hyperlink mapped by the URI (e.g. the target of 
the URI is only available if previously a field on a form has been input with 
a value between 10 and 20). Another example would be the hypermedia 
equivalent of referential integrity in a relational database; 

d) semantics describing the hyperlink such as ‘the target resource describes the son 
of the person described in the origin resource’ 






Database Research Issues in a WWW and GRIDs World 



15 



However, these properties are best described by associative metadata which then 
allows more convenient co-processing in context of metadata describing both 
resources and hyperlinks between them and - if appropriate - events. 



Associative Metadata. In the data and information domain associative metadata can 
describe: 

a) a set of data (e.g. a database, a relation (table) or a collection of documents or 
a retrieved subset). An example would be a description of a dataset collected as 
part of a scientific mission; 

b) an individual instance (record, tuple, document). An example would be a library 
catalogue record describing a book; 

c) an attribute (column in a table, field in a set of records, named element in a set of 
documents). An example would be the accuracy / precision of instances of the 
attribute in a particular scientific experiment; 

d) domain information (e.g. value range) of an attribute. An example would be the 
range of acceptable values in a numeric field such as the capacity of a car engine 
or the list of valid values in an enumerated list such as the list of names of car 
manufacturers; 

e) a record / field intersection unique value (i.e. value of one attribute in one 
instance) This would be used to explain an apparently anomalous value. 

In the relationship domain, associative metadata can describe relationships between 
sets of data e.g. hyperlinks. Associative metadata can - with more flexibility and 
expressivity than available in e.g. relational database technology or hypermedia 
document system technology - describe the semantics of a relationship, the 
constraints, the roles of the entities (objects) involved and additional constraints. 

In the process domain, associative metadata can describe (among other things) the 
functionality of the process, its external interface characteristics, restrictions on 
utilisation of the process and its performance requirements / characteristics. 

In the event domain, associative metadata can describe the event, the temporal 
constraints associated with it, the other constraints associated with it and actions 
arising from the event occurring. 

Associative metadata can also be personalised: given clear relationships between 
them that can be resolved automatically and unambiguously, different metadata 
describing the same base data may be used by different users. 

Taking an orthogonal view over these different kinds of information system objects 
to be described, associative metadata may be classified as follows: 

1) descriptive: provides additional information about the object to assist in 
understanding and using it; 

2) restrictive: provides additional information about the object to restrict access to 
authorised users and is related to security, privacy, access rights, copyright and 
IPR (Intellectual Property Rights); 

3) supportive: a separate and general information resource that can be cross-linked 
to an individual object to provide additional information e.g. translation to 
a different language, super- or sub-terms to improve a query - the kind of 
support provided by a thesaurus or domain ontology; 

Most examples of metadata in use today include some components of most of these 
kinds but neither structured formally nor specified formally so that the metadata tends 




16 K.G. Jeffery 



to be of limited use for automated operations - particularly interoperation - thus 
requiring additional human interpretation. 



3.3 Agents 

Agents operate continuously and autonomously and act on behalf of the external 
component they represent. They interact with other agents via brokers, whose task it 
is to locate suitable agents for the requested purpose. An agent’s actions are 
controlled to a large extent by the associated metadata which should include either 
instructions, or constraints, such that the agent can act directly or deduce what action 
is to be taken. Each agent is waiting to be ‘woken up’ by some kind of event; on 
receipt of a message the agent interprets the message and - using the metadata as 
parametric control - executes the appropriate action, either communicating with the 
external component (user, source or resource) or with brokers as a conduit to other 
agents representing other external components. 

An agent representing an end-user accepts a request from the end-user and interacts 
with the end-user to refine the request (clarification and precision), first based on the 
user metadata and then based on the results of a first attempt to locate (via brokers 
and other agents) appropriate sources and resources to satisfy the request. The 
proposed activity within GRIDs for that request is presented to the end-user as 
a ‘deal’ with any costs, restrictions on rights of use etc. Assuming the user accepts the 
offered deal, the GRIDs environment then satisfies it using appropriate resources and 
sources and finally sends the result back to the user agent where - again using 
metadata - end-user presentation is determined and executed. 

An agent representing a source will - with the associated metadata - respond to 
requests (via brokers) from other agents concerning the data or information stored, or 
the properties of the software stored. Assuming the deal with the end-user is accepted, 
the agent performs the retrieval of data requested, or supply of software requested. 

An agent representing a resource - with the associated metadata - responds to 
requests for utilisation of the resource with details of any costs, restrictions and 
relevant capabilities. Assuming the deal with the end-user is accepted the resource 
agent then schedules its contribution to providing the result to the end-user. 



3.4 Brokers 

Brokers act as ‘go betweens’ between agents. Their task is to accept messages from 
an agent which request some external component (source, resource or user), identify 
an external component that can satisfy the request by its agent working with its 
associated metadata and either put the two agents in direct contact or continue to act 
as an intermediary, possibly invoking other brokers (and possibly agents) to handle, 
for example, measurement unit conversion or textual word translation. 

Other brokers perform system monitoring functions including overseeing 
performance (and if necessary requesting more resources to contribute to the overall 
system e.g. more networking bandwidth or more compute power). They may also 
monitor usage of external components both for statistical purposes and possibly for 
any charging scheme. 




Database Research Issues in a WWW and GRIDs World 



17 



3.5 The Components Working Together 

Now let us consider how the components interact. An agent representing a user may 
request a broker to find an agent representing another external component such as 
a source or a resource. The broker will usually consult a directory service (itself 
controlled by an agent) to locate potential agents representing suitable sources or 
resources. The information will be returned to the requesting (user) agent, probably 
with recommendations as to order of preference based on criteria concerning the 
offered services. The user agent matches these against preferences expressed in the 
metadata associated with the user and makes a choice. The user agent then makes the 
appropriate recommendation to the end-user who in turn decides to ‘accept the deal’ 
or not. 

4 Ambient Computing 

The concept of ambient computing implies that the computing environment is always 
present and available in an even manner. The concept of pervasive computing implies 
that the computing environment is available everywhere and is ‘into everything’. The 
concept of mobile computing implies that the end-user device may be connected even 
when on the move. In general usage of the term, ambient computing implies both 
pervasive and mobile computing. 

The idea, then, is that an end-user may find herself connected (or connectable - she 
may choose to be disconnected) to the computing environment all the time. The 
computing environment may involve information provision (access to database and 
web facilities), office functions (calendar, email, directory), desktop functions (word 
processing, spreadsheet, presentation editor), perhaps project management software 
and systems specialised for her application needs - accessed from her end-user device 
connected back to ‘home base’ so that her view of the world is as if at her desk. In 
addition entertainment subsystems (video, audio, games) should be available. 

A typical configuration might comprise: 

a) a headset with earphone(s) and microphone for audio communication, connected 
by bluetooth wireless local connection to 

b) a PDA (personal digital assistant) with small screen, numeric/text keyboard (like 
a telephone), GSM/GPRS (mobile phone) connections for voice and data, 
wireless LAN connectivity and ports for connecting sensor devices (to measure 
anything close to the end-user) in turn connected by bluetooth to 

c) an optional notebook computer carried in a backpack (but taken out for use in 
a suitable environment) with conventional screen, keyboard, large hard disk and 
connectivity through GSM/GPRS, wireless LAN, cable LAN and dial-up 
telephone. 

The end-user would perhaps use only (a) and (b) (or maybe (b) alone using the built 
in speaker and microphone) in a social or professional context as mobile phone and 
‘filofax’, and as entertainment centre, with or without connectivity to ‘home base’ 
servers and IT environment. For more traditional working requiring keyboard and 
screen the notebook computer would be used, probably without the PDA. The two 
might be used together with data collection validation / calibration software on the 
notebook computer and sensors attached to the PDA. 




18 K.G. Jeffery 



The balance between that (data, software) which is on servers accessed over the 
network and that which is on (one of) the end-user device(s) depends on the mode of 
work, speed of required response and likelihood of interrupted connections. Clearly 
the GRIDs environment is ideal for such a user to be connected. 

Such a configuration is clearly useful for a ‘road warrior’ (travelling salesman), for 
emergency services such as firefighters or paramedics, for businessmen, for 
production industry managers, for the distribution / logistics industry (warehousing, 
transport, delivery), for scientists in the field.... and also for leisure activities such as 
mountain walking, visiting an art gallery, locating a restaurant or visiting an 
archaeological site. 



5 The Challenges 

Such an IT architectural environment inevitably poses challenging research issues. 
The major ones are: 



5.1 Metadata 

Since metadata is critically important for interoperation and semantic understanding, 
there is a requirement for precise and formal representation of metadata to allow 
automated processing. Research is required into the metadata representation language 
expressivity in order to represent the entities user, source, resource. For example, the 
existing Dublin Core Metadata standard [11] is machine-readable but not machine- 
understandable, and furthermore mixes navigational, associative descriptive and 
associative restrictive metadata. A formal version has been proposed [2], 



5.2 Agents 

There is an interesting research area concerning the generality or specificity of agents. 
Agents could be specialised for a particular task or generalised and configured 
dynamically for the task by metadata. Furthermore, agents may well need to be 
reactive and dynamically reconfigured by events / messages. This would cause 
a designer to lean towards general agents with dynamic configuration, but there are 
performance, reliability and security issues. In addition there are research issues 
concerning the syntax and semantics of messages passed between agents and brokers 
to ensure optimal representation with appropriate performance and security. 



5.3 Brokers 

A similar research question is posed for brokers - are they generalised and dynamic 
or specific? However, brokers have not just representational functions, they have also 
to negotiate. The degree of autonomy becomes the key research issue: can the broker 
decide by itself or does it solicit input from the external entity (user, source, resource) 
via its agent and metadata? The broker will need general strategic knowledge 




Database Research Issues in a WWW and GRIDs World 



19 



(negotiation techniques) but the way a broker uses the additional information supplied 
by the agents representing the entities could be a differentiating factor and therefore 
a potential business benefit. In addition there are research issues concerning the 
syntax and semantics of messages passed between brokers to ensure optimal 
representation with appropriate performance and security. 



5.4 Security 

Security is an issue in any system, and particularly in a distributed system. It becomes 
even more important if the system is a common marketplace with great heterogeneity 
of purpose and intent. The security takes the forms: 

a) prevention of unauthorised access: this requires authentication of the user, 
authorisation of the user to access or use a source or resource and provision or 
denial of that access. The current heterogeneity of authentication and 
authorisation mechanisms provides many opportunities for deliberate or unwitting 
security exposure; 

b) ensuring availability of the source or resource: this requires techniques such as 
replication, mirroring and hot or warm failover. There are deep research issues in 
transactions and rollback/recovery and optimisation; 

c) ensuring continuity of service: this relates to (b) but includes additional fallback 
procedures and facilities and there are research issues concerning the optimal 
(cost-effective) assurance of continuity. 

In the case of interrupted communication there is a requirement for synchronisation 
of the end-user’s view of the system between that which is required on the PDA and / 
or laptop and the servers. 

There are particular problems with wireless communications because of 
interception. Encryption of sensitive transmissions is available but there remain 
research issues concerning security assurance. 



5.5 Privacy 

The privacy issues concern essentially the tradeoff of personal information provision 
for intelligent system reaction. There are research issues on the optimal balance for 
particular end-user requirements. Furthermore, data protection legislation in countries 
varies and there are research issues concerning the requirement to provide data or to 
conceal data. 



5.6 Trust 

When any end-user purchases online (e.g. a book from www.amazon.com) there is 
a trust that the supplier will deliver the goods and that the purchaser’ s credit card 
information is valid. This concept requires much extension in the case of contracts 
for supply of engineered components for assembly into e.g. a car. The provision of an 
e-marketplace brings with it the need for e-tendering, e-contracts, e-payments, 
e-guarantees as well s opportunities to re-engineer the business process for 




20 K.G. Jeffery 



effectiveness and efficiency. This is currently a very hot research topic since it 
requires the representation in an IT system of artefacts (documents) associated with 
business transactions. 



5.7 Interoperability 

There is a clear need to provide the end-user with homogeneous access to 
heterogeneous information sources. His involves schema reconciliation / mapping and 
associated transformations. Associated with this topic are requirements for languages 
that are more representative (of the entities / objects in the real world) and more 
expressive (in expressing the transformations or operations). Recent R&D [10], [9] 
has indicated that graphs provide a neutral basis for the syntax with added value in 
graph properties such that structural properties may be used. 



5.8 Data Quality 

The purpose of data, especially when structured in context as information, is to 
represent the world of interest. There are real research issues in ensuring this is true - 
especially when the data is incomplete or uncertain, when the data is subject to certain 
precision, accuracy and associated calibration constraints or when only by knowing 
its provenance can a user utilise it confidently. 



5.9 Performance 

The architecture opens the possibility of, knowing the characteristics of data / 
information, software and processing power on each node, generating optimal 
execution plans. Refinements involve data movement (expensive if the volumes are 
large) or program code movement (security implications) to appropriate nodes. 



6 Conclusion 

The GRIDs architecture will provide an IT infrastructure to revolutionise and expedite 
the way in which we do business and achieve leisure. The Ambient Computing 
architecture will revolutionise the way in which the IT infrastructure intersects with 
our lives, both professional and social. The two architectures in combination will 
provide the springboard for the greatest advances yet in Information Technology. This 
can only be achieved by excellent R&D leading to commercial take-up and 
development of suitable products, to agreed standards, ideally within an environment 
such as W3C (the World Wide Web Consortium). The current efforts in GRID 
computing have moved some way away from metacomputing and towards the 
architecture described here with the adoption of OGSA (Open Grids Services 
Architecture). However, there is a general feeling that Next Generation GRID 
requires an architecture rather like that described here, as reported in the Report of the 
EC Expert Group on the subject [3], 




Database Research Issues in a WWW and GRIDs World 



21 



Acknowledgements. Some of the material presented here has appeared in previous 
papers by the author. Although the author remains responsible for the content, many 
of the ideas have come from fruitful discussions not only with the author’s own team 
at CCLRC-RAL but also with many members of the UK science community 
(requirements) and the UK Computer Science / Information systems community. The 
author has also benefited from discussions in the contexts of ERCIM 
( www.ercim.org ) and W3C ( www.w3.org ). 



References 

1. Bemers-Lee.T; Weaving the Web 256 pp Harper, San Francisco September 1999 ISBN 
0062515861 

2. http://purl.oclc.org/dc/ 

3. www.cordis.lu/ist/grids/index.htm 

4. Foster, I„ Kesselman, C. (eds): The Grid: Blueprint for a New Computing Infrastructure. 
Morgan-Kauffman (1998) 

5. Jeffery, K G: An Architecture for Grey Literature in a R&D Context. Proceedings GL'99 
(Grey Literature) Conference Washington DC October 1999 

6. http://www.konbib.nl/grcvnct/framc4.htm 

7. Original Paper on GRIDs, unpublished, available from the author 

8. Jeffery, K.G.: Metadata. In BrinkkemperJ; Lindencrona.E; Solvberg.A (eds): Information 
Systems Engineering’ Springer Verlag, London (2000), ISBN 1-85233-317-0. 

9. Jeffery, K.G., Hutchinson, E.K., Kalmus, J.R., Wilson, M.D., Behrendt, W., Macnee, C.A.: 
A Model for Heterogeneous Distributed Databases. Proceedings BNCOD12 July 1994; 
LNCS 826 Springer- Verlag (1994) 221-234 

10. Kohoutkova, J; Structured Interfaces for Information Presentation. PhD Thesis, Masaryk 
University, Bmo, Czech Republic 

11. Skoupy.K; Kohoutkova,J; Benesovsky.M; Jeffery, K G: Hypermedata Approach: A Way to 
Systems Integration' Proceedings Third East European Conference, ADBIS'99, Maribor, 
Slovenia, September 13-16, 1999, Published: Institute of Informatics. Faculty of Electrical 
Engineering and Computer Science, Smetanova 17, IS-2000 Maribor, Slovenia, 1999, 
ISBN 86-435-0285-5, 9-15 

12. http://www.dublincore.org/ 




Integration, Diffusion, and Merging in Information 
Management Discipline 



Vijay Kumar 

SCE, Computer Networking, University of Missouri-Kansas City 
5100 Rockhill, Kansas City, MO 64110, USA 
kumarv@umkc . edu 



Abstract. We observe that information is the life force through which we 
interact with our environment. The dynamic state of the world is maintained by 
information management. These observations motivated us to develop the 
concept of fully connected information space which we introduce in this paper. 
We discuss its structure and properties and present our research work and 
contributions for its maintenance. We also speculate the future of this 
information space and our mode of interaction with it. 



1 Introduction 

The dynamic state of the world is managed by the laws of the nature, which we 
cannot alter or mess around with. The only way to maintain our lives in this dynamic 
environment is to synchronize our activities with the activities of nature. For example, 
we must learn about natural disasters in an appropriate time and plan our activities 
accordingly. In many situations activities are time bound and therefore, we must 
define time constraints (TC) for our approach to handle such activities. For example, 
we retrieve/acquire information about a possible earthquake from multiple sources. 
We integrate relevant pieces of information for making some sense and then we 
decide to take some safety measure. We would like to describe the entire process in 
three simple steps, which we refer to as Pull-Process-Push (PPP). In Pull step 
information is retrieved from the global information space, in Process step the 
retrieved and integrated information is manipulated and finally in Push step the result 
and modified information is thrown back to the information space. Unfortunately 
these steps are not that easy to perform. Each defines a large research domain, which 
is increasing continuously as our interaction with information change and with our 
new requirements. It is important to note that the research activities in these domains 
are interdisciplinary with a high degree of cooperation among majority of technical 
disciplines. We confine our discussion to information management discipline. 

To achieve these PPP successfully the information management discipline, 
initially, worked in a modular fashion with little integration. Thus, data processing, 
networking, telecommunication, etc., evolved in a modular fashion and appeared to be 
remotely complementary to each other. Our ever growing need of new technology for 
managing information either motivated us or forced us to discover interrelationship 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 22-40. 2004. 
© Springer-Verlag Berlin Heidelberg 2004 




Integration, Diffusion, and Merging in Information Management Discipline 



23 




Land and water 



Fig. 1 . A Fully Connected Information Space 



Under water 



among these disciplines. Our investigation reveled that they are not only 
complementary but highly dependent on each other and essential to build the fully 
connected information space we deserve. 

Figure 1 illustrates the fully connected information space we require. A node of 
this space which can be any real world object that has some functionality is fully and 
continuously connected with all other objects. Nearly all these connections are duplex 
where any two parties can exchange consistent information without any temporal and 
spatial constraints. This will create a very high degree of concurrent traffic and if 
updates are allowed - which must be - then there will be an extremely high degree of 
data and resource (CPU, I/O, etc.) contention. At present there is no concurrency 
control and scheduling mechanisms which can satisfactorily managed such traffic. 
The information space will inherit not only the problem of concurrency but all other 
information management problems - system and application - and each one will 
require an innovative solution. We will have to deal with information space recovery, 
database distribution, and so on, in a very different way. These are some of the 
problems we present in this paper and provide our solutions to the basic problems. 



2 Application Domain 

There will be millions of application level problems. A user will retrieve information 
from a large number of sources, which must be integrated to make some sense out of 
that. No information integration scheme is capable to handle such varied formats. It is 
so frustrating to deal with state tax form format, federal tax form format, admission 
forms format, and so on. Some ask First name. Middle initial, Last name, in this order 
where as some other forms ask for them in reverse order. It is obvious that such 
diversity is likely to increase significantly in the globally shared information space 
and to improve quality of service and user comfort some information integration 
interface is highly desirable. Figure 2 presents a possible scenario of information 
integration. 



24 



V. Kumar 




Fig. 2. Information integration scenario 



2.1 Medical Informatics 

Let us consider highly heterogeneous medical informatics domain. Current health care 
infrastructure and the services it provides are highly federated. Patients are seen in 
emergency departments, specialized clinics, physician’s offices, and inpatient hospital 
environments. Prescriptions are filled in pharmacies, and laboratory and radiographic 
information is captured in yet another environment. From data format viewpoint 
information from each device including human is represented in a specialized format 
usually not compatible to each other. The physician reviews each data separately and 
discovers compatibility in an ad hoc manner which is good enough for a correct 
diagnosis. This is not only time consuming but primitive from current information 
management viewpoint. 



Voice recorded data 
Data format - Fv 



Physical examination data 
Data format - Fp 



Electronic Medical Record 

x \/ / fv 






Fo 



Fh 

Fp Fx 
Fn 



Hand recorded history 
Data format - Fh 



X-Ray data 
Data format - Fx 



OCR data 
Data format - Fo 



NMR data 
Data format - Fn 



Fig. 3. Heterogeneous medical data repository 




Integration, Diffusion, and Merging in Information Management Discipline 



25 



Figure 3 visualizes the current scenario in practice. A number of data sources such 
as X-Ray, OCR, etc., have their own specific format suitable for recording the results 
of a patient. A data format such as Fv is usually not compatible; semantically and 
syntactically with other format. The data compatibility problem gets worse because of 
synonyms and homonyms which may be present in all or some of the formats. Then 
there is a question of false data redundancy which may not be easily recognizable. For 
example two different patients with the same name may be examined by two different 
caregivers and one is subjected to OCR and another to X-Ray. If not careful these two 
records may falsely taken as duplication which may lead to incorrect billing or 
diagnosis. A significant challenge lies in the fact that medical information is related to 
people, and people are inherently difficult to uniquely identify. There are a finite 
number of combinations of first and surnames. This leads to significant real-world 
duplication of partial or entire names. Presenting further difficulties is the fact that 
many people are actually identified by more than one name, often using a nickname 
or preferring the use of a middle name rather than their given first name. Government 
or corporate identifiers such as SSN ( Social Security Number), or medical record 
number do not exist for all people; additional differentiators which are inherent to all 
persons such as DOB ( Date of Birth), gender, race are limited in their ability to 
uniquely identifier an individual again by the finite set of combinations. A positive 
DNA identification of individual patients is not practicable in most locations. 
Sequencing technology is currently limited and expensive, and the resultant data is 
large, in the order of 3x I 0 9 base pairs or 1GB per patient. This is clearly too large to 
be a suitable primary key even if all participating component systems could access 
this data. 

Positive identification of an entity (e.g., patient, department, equipment, etc.) 
within a single well-regulated database system is relatively simple and is a foundation 
of relational databases. In a relational model the instance of an entity is represented by 
a row and a primary key is used to uniquely identify the entity. Other related relations 
are linked together using the concept of foreign key. This scheme however only 
enforces tuple uniqueness and does not guarantee that a real world entity is not 
duplicated within the base relation. For example, multiple instances of Thomas Smith 
may occur in a table each with an internally unique identifier as a result of treatments 
at multiple locations, which may not be related. This is desirable if indeed more than 
one real patient is named Thomas Smith. At the architecture level this demonstrates 
the significance of choosing appropriate candidate keys. Identification and duplication 
prevention of entities can be regulated through intelligent applications such as good 
Human Computer Interface (HCI) for data entry and creation of patients through the 
Admission, Discharge, and Transfer (ADT) software. A human operator can make 
decisions regarding the uniqueness of a real patient and distinguish between the 
requirement to create a new patient database instance or the selection of an existing 
record. Significant challenges are introduced however when an automation of this 
process is to be achieved; as in the case of entity resolution across a federation or 
batch import processes. 

Integration of distributed data sources does not guarantee unique instances of real- 
world entities. The component systems of a federation or data feeds into a repository 
may use different candidate keys for the base entity relations. Several approaches 
have been developed to address this problem ranging from probabilistic key 
matching, to pre-hoc user-defined mapping [1], In addition to this integration issue, 




26 



V. Kumar 



another major problem which frequently occurs and has not been addressed 
satisfactorily is the correctness of data. They assume that accessed or data read by any 
equipment is correct. Often potential key elements are inconsistent or incomplete due 
to the data acquisition method or clinical environment. For example OCR ( Optical 
Character Recognition ) scanners are prone to character misrecognition. Clinically 
induced ambiguity can result from the chaos of acquiring data in emergent situations 
or lack of enforcement of standard operation procedures. Often a physician’s initials 
or only last name are collected, or non-standard clinical abbreviations are used. 

The variety of data acquisition methods involved in the collection of medical 
information increases the difficulty of assimilating these facts into a comprehensive 
patient history. A majority of relevant medical history is still hand-written into patient 
charts. Often this information is difficult or impossible to acquire electronically. Data 
is frequently collected in an urgent or emergent clinical setting where it is 
impracticable to interact directly with an online data acquisition tool. The non- 
prompted and non-validated nature of this data collection often leads to incomplete 
information, and data acquired in this manner is difficult to later merge to an 
electronic system. Snapshot digital images of the paper charts can be linked to patient 
database records; however this increases the storage requirements of the system 
without significant analytical benefit. Images are not easily analyzed 
programmatically. Physician dictation is also not easily captured. Voice recognition 
technologies are in their infancy, and libraries to address the specialized vocabulary of 
the medical domain are even less developed. Transcription remains the primary 
method of electronically acquiring dictation; and even then this information is in 
natural-language syntax rather than an easily analyzable format. Some attempt has 
been made to facilitate directed point of care data entry by physicians through mobile 
online systems such as Pen & Pad which prompt the user to build statements which 
are valid in predefined meta-knowledge syntax [2], However, in practice point of care 
data collection is largely performed on paper forms, which are converted to electronic 
format by means of OCR scanners. This acquisition method is prone to character 
misrecognition and form alignment issues. 

Thus, from administrative as well as from treatment viewpoints a correct and 
consistent maintenance of EMR ( Electronic Medical Record) is highly desirable. This 
kind of maintenance must not undermine the security and efficiency in data access 
and management. In this paper we do not address security, which is a highly complex 
problem to solve in its own right. 

To express the entire patient experience in an EMR or assemble a comprehensive 
continuum of care has become extremely complex but these have to be resolved 
efficiently for accurately identifying and treating patients across systems. If the 
specialized component systems are to be expressed through a federated schema or 
contribute to a data warehouse, an automated mechanism must be built to facilitate 
this entity resolution. 

Due to limitations of the Human Computer Interface (HCI) in HIMS, the data 
collection process especially at point of care is prone to error. This error however, 
once quantified can be exploited to provide an information gain. Our work in progress 
explores using contextually implied keys and probabilistic adjustments to enhance a 
traditional key equivalence approach for solving some of the medical data 
management problems. 

We observe that data produced by a source acquires some characteristics of the 
source. Thus, an instrument that measures blood pressure or heart beat leaves its mark 




Integration, Diffusion, and Merging in Information Management Discipline 



27 



on the result. We argue that a reverse process can be applied which will allow us to 
identify the source of data by examining the data value. Thus, by looking at a 
patient’s history it would be possible to identify what medical instruments were used 
to collect health data for diagnosis. This will also provide us a reasonable amount of 
data accuracy and some motivation for further examination. 



2.2 World Wide Web (Web) 

The advent of World Wide Web (Web) presented a global sharable repository. Any 
user could surf the web anytime without being aware of the physical location of the 
repository. However, it did little to create a unified information space which is being 
experimented by the Grid infrastructure. However, in spite of a number of limitations 
the web has turned out to be an excellent platform for e-commerce and m-commerce. 
Organizations no longer want to limit the scope of the web to a repository and a 
showcase; rather they want to use it as a powerful communication tool to disseminate 
latest information on all kinds of things. They find web systems more suitable in 
every respect than legacy systems for managing their activities because of its 
flexibility and universality. As a result of this, all information storage and access 
activities are migrating to web. In our work we are experimenting with web services 
to provide desired information and services to static as well as mobile users. 

Web Bazzar: A Web-Based System for Service Discovery. There have been 
increasing demands from mobile users (M-users) to access location-based information 
(locations of restaurant, movie theatres, etc.) and desired services (ticket booking, 
buying pizzas, etc.) at any time and from anywhere through mobile devices using 
Location Dependent Query (LDQ). The significant advances in mobile technology 
can easily incorporate this facility into the existing mobile infrastructure. The idea of 
providing services and information through web is not new and currently there are a 
few middleware-based solutions for accessing location-based information are 
available [1], [2], [3], [5], [8], [9], [16] but they are limited in scope. Their limitation 
is mainly due to a tight integration between Content Providers (CP) and Service 
Providers (SP), which makes dynamic configuration becomes harder to develop and 
expensive to process. 



Service Provider (SP) Content Provider (CP) 




Fig. 4. Location based information scheme 



Figure 1 illustrates the current scheme. Each CP provides specific information (i.e., 
weather, hotel, etc.) and supports specific format. A SP or a number of CPs has to 
individually register with a SP for satisfying the needs of a mobile user. In this tight 
integration or mapping, the user may have to content with fixed information format 




28 



V. Kumar 



and if Ihe user wants information on a particular topic his SP may not be able to 
provide it because the SP may not be able to register with the desired CP dynamically. 
In order to overcome these problems and efficiently satisfy all users’ (static or 
mobile) demands, we propose to use Web service as an interface (middleware) 
between the CPs and SPs. Thus, a SP will interact with Universal Description, 
Discovery & Integration (UDDI), which in turn will reach relevant web service to get 
the answer. 

Web Service-based middleware does provide a standard way to communicate 
among heterogonous applications and it is highly flexible and scalable but at present it 
is not well equipped to provide location-based services because (a) it uses centralized 
repository (e.g., UDDI) for publishing Web services, (b) it has limited keyword-based 
search facility for services and (c) it lacks appropriate semantics for discovering 
location based services. We propose to overcome these limitations in our architectural 
framework, referred to as “Web Bazzar”, of middleware approach which will make it 
possible to discover location-based web services easily and cheaply through the 
location-aware UDDI. We present a couple of simple examples to show the 
usefulness of our proposal. 

Example 1. User subscribes to SP for service by giving payment information and 
preference profile. The user during his trip to Kansas City wants to go to a coffee 
shop. He enters the request ( using some location dependent query language), gets the 
list of coffee shops (identified using his personal profile), selects the shop which gives 
discount on coffee, clicks the link and pays for the item. In return he gets a 
transaction id, goes to the shop, enters the id and gets his coffee. The user profiles 
can be maintained and the information can be given to the user proactively. 

Example 2. User wants to eat special pizza. He selects pizza store using mobile 
device after getting store’s information from Web Bazzar. The sendee selects the right 
kinds of pizzas using information from profde. The pizza order is given to the shop 
and when it is ready the GPS sendee is used to get user’s location. User location is 
dispatched to map web sendee to obtain route for delivery. 



Issues in the Design and Development of Web Bazaar. M-commerce application 
architecture framework can be broadly classified in to two models: Push Model and 
Pull Model [4], [5], In the pull model user requests a transaction, server looks for 
appropriate service, contacts the CP and retrieves the information, process data and 
gives the results back to the user. In the push model the server collects the 
information from different data sources according to the current location of the user 
and pushes it to mobile unit. Since our aim is to develop a proactive architecture for 
m-commerce applications push model is more suited to our requirements. Proactive 
architecture requires caching of possible user required context services on the mobile 
unit which greatly reduces the query processing time as the upward communication 
from the mobile unit to the middleware is greatly reduced. So the major requirements 
in mobile middleware are (a) semantic profile driven cache management (b) semantic 
web services description, (c) semantic web services discovery protocol, (d) proposing 
a structure of UDDI, which can search, based location context of the user, and (e) 
broadcasting of web services information. Figure 5 shows the components of the 
proposed middleware. 




Integration, Diffusion, and Merging in Information Management Discipline 



29 



CPA.SP Web service 
registry 



Middleware 



Semantic caching 


User profile manager 


Semantic service discovery 











User profile 



User context 



Fig. 5. A reference structure of Web Bazaar 



Semantic profile driven caching component. Traditional caching uses spatial 
locality of data, but semantic caching considers the semantic relation of data. In our 
case we relate data with respect to the location and data related to the same location is 
cached. For example all the restaurants are related with location. So all the restaurants 
at a particular location are cached rather than the restaurants stored physically next to 
each other in the database. Semantic caching is required in our architecture to 
minimize communication overhead and retrieval time. 

Semantic Caching for Location Dependent Data (LDD). When the user is on the 
move, data of the current location is cached in the MU. Data that is cached is based on 
the preferences specified by the user in his profile. Also as user moves out of the 
current location data stored in the MU becomes invalid. Mechanisms have been 
specified in [13] on cache replacement of location dependent data. For LDD cache 
replacement should be a balance between how frequently the location data is cached 
and the proximity of the current user location to the location data stored on the MU. 
The aim here is to optimize the tradeoff between the number of cache refreshes, 
which are triggered by the change in user location and validity of the LDD stored on 
the MU. Schemes like [14] assume that the speed and direction of the MU is available 
so the future location of the MU is predicted and the location data is cached 
accordingly which further reduces the number of cache refreshes. The information 
about user movement (direction, speed, etc.) can be obtained and we plan to use this 
in the development of Web Bazzar. 

We also have to deal with user connectivity problem. If the user is continuously 
connected to the wireless network then the cached data can be refreshed as soon as 
user moves to a different location. But if the user is not continuously connected then 
data-recharging scheme [11] needs to be used. Whenever the user connects to the 
network, cache is recharged with the data depending on the current location, his future 
plans and the preferences user specifies in the profile. 

Thus, in the information management discipline the integration, diffusion, and 
merging of web, data warehousing, business processes, mobility, networking, etc., 
become seamless. Now a days it is very common to hear mobile web mining, web 
mining, web caching, mobile database systems, mobile federated systems, and so on, 
and the state of the research and development in this integrated area. It is, therefore, 
not acceptable to say that one is doing research in mobile web caching but is not 









30 



V. Kumar 



knowledgeable in mobile discipline. Researchers as well as the research activities are 
integrated, diffused and merged in a seamless manner. 



3 Mobility 

Wireless communication through PCS (Personal Communication Systems) or GSM 
(Global System for Mobile Communications) has become a norm of present day 
society. Cell phones are more common than watches and in addition to being portable 
communication tools, they have become web-browsing platforms. 
Telecommunication companies are continuously improving the communication 
qualities, security, availability and reliability of cell phones and trying to enhance its 
scope by adding data management capabilities, which is highly desirable. Motivated 
by such growing demand, we envision an information processing system based on 
PCS or GSM architecture, which we refer to as the Mobile Database System (MDS). 
It is essentially a distributed client/server system where clients can move around 
freely while performing their data processing activities in connected, disconnected or 
intermittent connected mode. The MDS that we present here is a ubiquitous database 
system where unlike conventional systems the processing unit could also reach data 
location for processing. Thus, it can process debit/credit transactions, pay utility bills, 
make airline reservations, and other transactions without being subject to any 
geographical constraints. Since there is no MDS type of system available, it is 
difficult to identify the transaction volume at mobile units, however, the present 
information processing needs and trends in e-commerce indicate that transaction 
workload at each mobile unit could be high and MDS would be a useful resource to 
organizations and individuals alike. 

Although MDS is a distributed system based on client server paradigm, it functions 
differently than conventional centralized or distributed systems and supports diverse 
applications and system functionalities. It achieves such diverse functionalities by 
imposing comparatively more constraints and demands on MDS infrastructure. To 
manage system-level functions, MDS may require different transaction management 
schemes (concurrency control, database recovery, query processing, etc.), different 
logging scheme, different caching schemes, and so on. The topic of this paper is log 
management for application recovery through the use of mobile agents. 

3.1 Reference Architecture of Mobile Database System 

Figure 6 illustrates our reference architecture of Mobile Database System (MDS). It is 
a distributed multidatabase client/server system based on cellular infrastructure. We 
have added a number of DBSs (database Servers) to incorporate data processing 
capability without affecting any aspect of the generic mobile network [3], 

A set of general purpose computers (PCs, workstations, etc.) are interconnected 
through a high-speed wired network, which are categorized into Fixed Hosts (FH) and 
Base Stations (BS) or mobile support stations (MSS). One or more BSs are connected 
with a BS Controller or Cell Site Controller (BSC) [9], which coordinates the 
operation of BSs using its own stored software program when commanded by the 
MSC (Mobile Switching Center). We also incorporate some additional simple data 
processing capability in BSs to handle the coordination of transaction processing. 




Integration, Diffusion, and Merging in Information Management Discipline 



31 




Fig. 6. A reference architecture of Mobile Database System (MDS) 

Unrestricted mobility in PCS and GSM is supported by wireless link between BS 
and mobile units such as PDA (Personal Digital Assistants), laptop, cell phones, etc. 
We refer to these as Mobile Hosts (MH) or Mobile Units (MU) [9], [12], which 
communicate with BSs using wireless channels [9], The power of a BS defines its 
communication region, which we refer to as a cell. The size of a cell depends upon 
the power of its BS and also restricted by the limited bandwidth of wireless 
communication channels. Thus, the number of BSs in MDS defines the number of 
cells. In reality a high power BS is not used because of a number of factors [9], [ 12] 
rather a number of low power BSs are deployed for managing movement of MUs. 
A MU may be in powered off or in idle state (doze mode) or it may be actively 
processing data and can freely move from one cell to another. When a MU crosses a 
cell boundary, it is disconnected from its last BS and gets connected to the BS of the 
cell it enters. In such inter-cell movement the handojf median ism makes sure that the 
boundary crossing is seamless and data processing is not affected. 

A DBS provides full database services and it communicates with MUs only 
through a BS. DBSs can either be installed at BSs or can be a part of FHs or can be 
independent to BS or FH. A MU is unable to provide reliable storage as provided by 
conventional clients and for this reason it usually relies on the static nodes (FH or BS) 
to save its data. This is especially true for activities such as recovery, logging, 
concurrency control, data caching, etc. It is possible to install DBS at BSs, however, 
we argue against this approach. Note that BS is a switch and it has specific tasks to 
perform, which does not include database functionality. To work as a database server 
the entire architecture of a BS (hardware and software) may have to be revised, which 
would be unacceptable from mobile communication viewpoint. We argue that mobile 
database functionality and wireless communication should be modular with minimum 
degree of overlap on their functionality. For these reasons and for the reason of 
scalability, we created DBSs as separate nodes on the wired network, which could be 
reached by any BS at anytime. 




32 



V. Kumar 



3.2 Mobilaction: A Mobile Transaction Model 

Transaction concept is essential for dealing with any type of information 
management. It is especially true in MDS because it imposes a number of new 
constraints in information management. Motivated by unique requirements of MDS, 
we developed a mobile transaction model which we refer to as “ Mobilaction ” [20]. 
We present here some data characteristics related to mobility before we introduce 
Mobilaction. 

Conventional data do not change their values based on the mode of the query 
(when and where the query originated). Consider for example, the “SSN”, “mother's 
maiden name”, “city of birth”, “mother tongue”, etc., of a person. Any enquiry about 
these attributes of the person either from any where in the world will provide an 
identical response. On the other hand, there are some data types that generate different 
but correct responses when the mode of the query on them changes (for example, 
room rent of a hotel, sales tax, city tax, etc). If an enquiry on the tax rate is made 
about Kansas City and then at Dallas, then there would be two different but correct 
answers. Thus the same query on this data from a moving object with a changing 
query location could have different correct answers. We refer to the first type of data 
as “location free data” and to the second type as “Location Dependent Data (LDD)”. 

LDD gives rise to Location Dependent Query (LDQ) and Location Aware Query 
(LAQ). The answer of a LDQ depends on the geographical origin of the query. For 
example the answer to a query “ What is the distance of the airport ’ is strongly tied to 
the geographical origin of this query. Now let us introduce mobility in query 
processing. Let us consider, for example, a person who is traveling by car on a 
business trip from Boston, first to Kansas City and then to Dallas. While on the road 
the traveler continues to ask “ What is the distance of the airport’ after every few 
minutes. The system will generate multiple correct answers to this query and each 
answer will be strongly related with the geographical origin of the query. Thus from 
these reasoning we came to the conclusion that like ACID (Atomicity, Consistency, 
Isolation, and Durability) property, location mapping has to be a basic property of 
Mobilaction which we incorporate. We now formally define our Mobilaction model. 

An Execution Fragment e y is a partial order e tj ,= { Oj, <j} where 

• Oj = OSj u {Njl where OSj = U k Oj k , Oj k e{read, write f and Nj e{abort L , 
commit k }. Here these are location dependent commit and abort. 

• For any O jk and Oj, where Ojk = R(x) and Oj, = W(x)for a data object x, then 
either 0, k <j Oj, or O jt <j O jk 

• V Oj k e OSj, OSj<jNj 

A Mobile Transaction T, is a triple <F h L,, FLM,> where F, = fe,,, e i2 ... , e,„} is 
a set of execution fragments, L, = jl ib l i2 , ... , l,„} is a set of locations, and 
FLM, = I flm u , flm i2 , ... . flm in } is a set of fragment location mappings where 
\/j, flm,j(e,j) — I,]. 




Integration, Diffusion, and Merging in Information Management Discipline 



33 



3.3 Mobilaction: Execution and Commitment 

Although MDS is a distributed system based on client server paradigm, it functions 
differently than conventional centralized or distributed systems and supports diverse 
applications and system functionalities. It achieves such diverse functionalities by 
imposing comparatively more constraints and demands on MDS infrastructure. To 
manage system-level functions, MDS may require different transaction management 
schemes (concurrency control, database recovery, query processing, etc.), different 
logging scheme, different caching schemes, and so on. We describe one way of 
execution Mobilaction on MDS and present a commit protocol. 

A Mobilaction may run on multiple nodes which could be located anywhere in the 
network. Each e t represents a subset of the total T. processing. A T. is requested at a 
MU, it is fragmented [20], and are executed at the MU and at a set of DBSs. Note that 
no fragment of a Ti is sent to another MU for execution. This is because in MDS, a 
MU is a personal unit and its use is controlled by its owner who can switch it off or 
disconnect it from the network at any time. This could force the Ti to fail unnecessary. 
Furthermore, other MUs may not have necessary data to process the fragment 
generated by another MU, in which case the fragment will end up at a DBS. Also 
transfer of e/s to other MUs will incur wireless communication overhead which could 
be prohibitive. 

In MDS, like conventional distributed database systems, a coordinator (CO) is 
required to manage the commit of T. [20] and its role can be illustrated with the 
execution of a 7/ A T t originates at MU and its BS is identified as the holder of the 
CO of 7/ The MU fragments Ti extracts its e , sends T. - e. to the CO and begins 
processing e r The MU may move to other cell during the execution of e t , which must 
be logged for recovery. At the end of the execution of e r the MU updates its cache 
copy of the database, composes update shipment and sends it to the CO. CO logs the 
updates from the MU. 

Upon receipt of 71 - e i from MU, the CO splits 71 - e i into els (i ± j ) and sends them 
to a set of relevant DBSs for execution. Note that the presence of handoff may delay 
the execution and commit of a 71. In this situation even a small 71 may appear as a 
long-running 71. Thus, the meaning of long-running 71 on MDS could be (a) a small 
71 (such as debit/credit) may take long time to run because of frequent handoffs and 
(b) the T. does access a large number of data items, such as the preparation of bank 
customer monthly statements, and takes long time to execute in the absence of any 
handoff. It is, however, meaningless to run statement preparation transactions on MU 
and long-running transaction in our case will be mostly of (a) type. 

It is obvious that a conventional two-phase or three-phase commit protocol [10] 
would not work satisfactorily in MDS. It will generate excessive overhead, which 
could not be handled by MDS. We have developed a commit protocol, which we 
refer to as TCOT (Transaction Commit on Timeout) which meets the following 
objectives: 

• Uses minimum number of wireless messages. 

• MU and DBS involved in Ti processing have independent decision making 
capability and 

• It is non-blocking. 

TCOT is based on timeout concept. Timeouts are usually used to identify a failure 
situation. For example, in messaging systems the sender waits for the 




34 



V. Kumar 



acknowledgement of a message receipt for a timeout period before resending or not 
sending the message at all. In distributed database systems the use of timeout is 
necessary for developing a “non-blocking” transaction commit protocol 
[10], [11], [12]. We propose the use of timeout for our commit protocol. We assume 
that instead of failure the end of timeout period indicates a success. Thus, at the end 
of the timeout it is expected that the receiver has received the message sent by the 
sender. This is the basis of defining the completion of transaction commit in TCOT. 

TCOT strives to limit the number of messages (especially uplink) needed to 
commit a 71. It does so by assuming that all members of a commit set successfully 
commit their fragments within the defined timeout leading to commit of T.. Unlike 
2PC or 3PC [10], [11], [12], no further communications between the CO and 
participants take place for keeping track of the progress of fragments. However, the 
failure situation is immediately communicated to CO to make a final decision about 
commit. 

It is well known that finding the most appropriate value of a timeout is not always 
easy because it depends on a number of system variables, which could be difficult to 
quantify [10]. However, it is usually possible to define a value for timeout, which 
performs well in all cases. It should be noted that an imprecise value of timeout does 
not affect the correctness but affects the performance of an algorithm. We, therefore, 
assume that timeout value can be defined with some degree of accuracy satisfactory to 
TCOT. We discuss in detail the behavior and performance of TCOT in our 
paper [13]. 



3.4 Application Recovery in Mobile Database System 

An efficient scheme for application is required for MDS. We are not concerned about 
the database recovery because that is taken care by the underlying database recovery 
mechanisms. We have developed an efficient recovery protocol which uses mobile 
agents to recover from any kind of failure. 

Application recovery, unlike database recovery, enhances application availability 
by recovering the execution state of applications. This process is relatively more 
complex than database recovery because (a) there are a large numbers of applications 
required to manage database processing (b) presence of multiple application states, 
and (c) the absence of the notion of the “last consistent state”. This gets more 
complex in MDS because of (a) unique processing demands of mobile units, (b) the 
existence of random handoffs, and (c) the presence of operations in connected, 
disconnected, and intermittent connected modes. 

The log management is the main component of any recovery scheme. We present 
here our mobile agent based recovery approach. We argue that for MDS the use of 
conventional approaches for managing log, even with modifications, would impose 
unmanageable burden on the limited channel capacity and, therefore, reject their use. 

An efficient recovery scheme requires that the log management must consume 
minimum system resources and recreate the execution environment as soon as 
possible after MU reboots. For application recovery the MU and the server must build 
a log of the events that change the execution states of T r In conventional distributed 
systems, log management is straightforward since no mobility is involved and a single 
stable storage area is available for storing log. In MDS a MU cannot be relied upon 
and, therefore, it is necessary to store the log information at some stable place that can 




Integration, Diffusion, and Merging in Information Management Discipline 



35 



survive MU failure. Schemes that provide recovery in PCS failure use the BS where 
the MU currently resides for storing the log. Note that managing log for PCS failure 
is relatively easy because it does not support 71 processing. 

Our objective is to utilize the unique processing capability of mobile agents in 
managing application log for efficient application recovery, which will conform to 
MDS limitations and mobile discipline constraints. We aim to achieve this conformity 
and desired efficiency by incorporating the following properties in our scheme: 
(a) communication overhead (wired/wireless) should be low, (b) recovery time should 
be minimal, and (c) easy deployment of recovery schemes in the network. 

A mobile agent is an autonomous program that can move from machine to machine 
in heterogeneous network under its own control [14], It can suspend its execution at 
any point, transport itself to a new machine, and resume execution from the point it 
stopped execution. An agent carries both the code and the application state. Actually 
mobile agent paradigm is an extension of the client/server architecture with code 
mobility. Some of the advantages of mobile which we exploit are: 

a. Protocol Encapsulation: Mobile agents can incorporate their own protocols in 
their code instead of depending on the legacy code provided by the hosts. 

b. Robustness and fault-tolerance: When failures are detected, host systems can 
easily dispatch agents to other hosts. This ability makes the agents fault-tolerant. 

C. Asynchronous and autonomous execution: Once the agents are dispatched 
from a host, they can make decisions independently and autonomously. 

Our idea was to delegate all operations that involved mobility to mobile agents. We 
created a number of agents for creating, identifying, and writing log records. Thus, 
one agent was responsible for writing local log records, one was responsible for 
dispatching the local log records to a stable storage (at the base station), one was 
responsible for identifying mobile unit failure and log unification, and one was 
responsible for making log available to mobile unit for recovery. We developed two 
recovery protocols under a scheme which is referred to as “Forward Strategy”. Under 
this scheme we developed two recovery protocols which are called (a) Forward Log 
Unification Scheme and (b) Forward Notification Scheme. To establish the 
superiority of our schemes we compared its performance with three other schemes. 
We showed that our schemes gave better performance in most of the recovery 
situations. 



4 Sensor Technology 

All pervading aspect of information space introduced earlier was a very useful 
property but at the same time it created a serious problem related to the capture of 
information from difficult to reach geographical locations not easily reachable by 
humans such as ocean bed, enemy territories, deep space, and so on. The medical 
field also has similar difficulty, how to collect data of internal live organs such as 
liver, heart, etc. Such requirements gave rise to sensor technology where minute 
device called “sensor” is utilized for data collection, validation, processing, and 
storing. A sensor is a programmable, low-cost, low-power, multi-functional device. 
One of its multi-functional properties is its capability of continuously gathering 




36 



V. Kumar 



desired information about the location of its deployment. For medical field two types 
of sensor (a) immersive and (b) non-immersive were developed. Immersive sensors 
were planted inside human body and non-immersive sensors remained outside the 
body. 




Fig. 7. An ESN with Micro-sensomet 



Micro-sensornet 




We define the concept of “Embedded Sensor Space (ESS)”, which is a countably 
infinite set of uniquely programmed sensors. Thus, ESS = s r s 2 , ... , s where s, 

( ; = 1, 2, ..., oo) is a programmed sensor. A node in the embedded sensor net captures 
data of its environment and dispatches it to other sensors through routers. There are 
quite a few unsolved problems related to network management and routing. From 
ESN viewpoint s , and s, (i A j) are fully connected and have direct communication 
facility. 

We are working on a complex sensor network which is illustrated in Figure 7. The 
Embeded sensor net is composed of individual sensors and micro-sensor net. 
A micro-sensor net is a set of small number of specialized sensors fully connected 
together. One of the nodes in a micro-sensor net is responsible for coordinating the 
activities of other sensors in the set. This gives rise to the problem of leader election 
problem, which we do not discuss in this paper. 

The unique properties of sensors allowed us to link them in all kinds of topology. 
Thus, it is possible to build a globally connected infrastructure where uniquely 
programmed sensors are embedded at desired places. For example, programmed 
sensors may be embedded at various points in all cars of a family, in the house, in the 
office, in children's school bags, in parents' briefcases, etc. Each sensor will capture 
data and communicate with other sensors, which will help the parents to be aware of 
and be fully connected with everyday activities of each family member. At the time of 
need any family members can be reached instantly. On a large scale we can visualize 
sensor deployment at various places (buildings, malls, factories, etc.) of a city for 
continuous monitoring of events for managing security. Similarly to protect water 
supply, gas pipelines, and so on, programmed sensors can be deployed at strategic 
locations. 




Integration, Diffusion, and Merging in Information Management Discipline 37 




Authorized users 

i 



-I- 



Browser 



DBMS | 



I. Database,^ \ WebDb 



Data 

warehouse 



Fig. 8. Data capture. Validation, Analysis, Formatting and storage 

Figure 8 illustrates an example of sensor network deployment to monitor gas 
emission at various landfills and send information to the DBMS for further processing 
and dissemination. The reference sensor network architecture we envision will be 
very large and highly data intensive since sensors will be capturing data continuously. 
The emission of gases and their volume are not predictable so these sensors will have 
to be active continuously and there must also have a fail-safe scheme, which would 
ensure that any sensor failure or malfunctioning is promptly propagated to the servers 
for immediate action to minimize the damage. This implies that they will be sending 
different types of data with different constraints associated with them. Some data will 
be temporal in nature with limited validity. These must be processed in real-time and 
decision, if any, must be propagated to target sensors for changing or altering their 
functionality. The diversity of data category, their real-time characteristics, 
propagation of results to target sensors present a number of complex data 
management problems. 

We identify the following data management tasks, which must be performed 
efficiently for managing the entire network and dissemination of right information to 
right destination (people and institutions). Each of these steps are elaborated below 
and our research approach for their management is described 

• Stream data capture from specific sensors. 

• Validation of captured streams data. 

• Analyzing, formatting, and verifying real-time processing of stream data. 

• Storing of formatted data in the database and updating the data warehouse. 

• Posting necessary information on the web. 



4.1 Stream Data Capture from Specific Sensors 

A sensor can send data in any form, i.e., pulses, packets, sound, etc. There are a large 
number of data capture schemes are available [15], [18], [19]. We are in the process 
of developing, which will take into consideration energy conservation aspect. This 
stream data must be converted before it can be stored and processed. Since sensors in 
our network could be location specific we propose to program the L/L information in 
the sensor, which will be appended to each dispatch from that sensor. We agree that 
this will increase the cost but the benefit will outweigh the cost. This conversion can 






38 



V. Kumar 



be easily done by a simple mapping function, since the type of the dispatch from a 
sensor will be known in advance. A simple conversion table, residing in the interface, 
will be satisfactory for the conversion. 



4.2 Validation of Captured Streams Data 

Data must be validated before it is accepted for storage. The validation will require to 
(a) verify the source of the data, which must be the right sensor and (b) check that the 
captured data is not corrupt. We are investigating the use a directory, which will store 
the mapping of L/L with sensor. 



4.3 Posting Necessary Information on the Web 

Many organizations (private and government) will share the captured and processed 
stream data results. We propose to make them available through secured web. The 
web will be accessed through static and mobile clients. We have done significant 
amount of work on mobility [13], [19], [22], [23] and we will use one of the existing 
techniques for managing mobile data access and query. In addition to this, we will 
implement an automatic message delivery system where all the mobile clients will be 
informed through e-mail the arrival of any crucial or pre-selected type of data from 
any of the sensors. 

In dealing with ESS, we regard a sensor as a data with some semantics, which is 
provided to it by its unique programmed state. In a massive ESN, therefore, it 
becomes necessary to identify or trace individual sensor for whatever reason. This 
requirement creates a situation, which is very similar to data mining in conventional 
information space. In conventional approach for mining data with fuzzy information 
“a large person with red shirt”, mining techniques are used to find the correct data. It 
is also necessary to mine sensor in ESN when fuzzy information such as “find sensor 
which captures temperature of high rise building” is used. 

Sensors in a sensornet are insecure repositories and routers of data. There are many 
applications where sensors are deployed in hazardous environments in which they are 
subject to failure or destruction under attack by a malicious adversary. For example, 
consider seismic sensor networks in earthquake or rubble zones or sensors in military 
battlegrounds under enemy threat. Wireless sensor networks are also extremely 
vulnerable to data loss under denial of service (DoS) attacks. Nodes use wireless 
communication because the network's large scale, ad-hoc deployment and limited 
energy resources makes wired or satellite communication impractical. Jamming 
a transmitting nodes frequency makes its data unavailable. Thus, any model for 
ensuring effective query reporting and collaborative mining in sensor networks, while 
incorporating the constraints of energy efficiency and distributed decision-making, 
should simultaneously take sensor failure and security considerations into account. 
This will require the development of specific algorithms to ensure that the tasks of 
(a) data storage and content in distributed repositories (which could be special sensor 
nodes within the network) and (b) data retrieval are not affected by the inhospitable 
environment. 




Integration, Diffusion, and Merging in Information Management Discipline 



39 



5 Conclusions and Future Direction 

In this paper we presented the role of information and its management for 
synchronizing our activities with the dynamic state of the environment around us. We 
identified essential activities which are continuously being imposed on the 
information we desire. We introduced the concept of fully connected information 
space to illustrate the nature and instances of these activities. We recognized 
the significance of advances in wired and wireless technologies and discussed their 
effect on information management schemes. We presented our research work and 
contributions in this area for wired and wireless platforms. In particular we 
addressed the problems of information integration, medical informatics, e-commerce, 
m-commerce, static and mobile web, and sensor technology. 

The future of information management is quite bright. Significant changes will 
occur in the way we interact with the information space and state of the art gadget we 
will use. Wireless world will dominate and continuous connectivity will persist. 
Information processing will pervade every object of this world and fancy gadgets will 
rule our lives. Let us hope that we still drive these gadgets not the other way round. 



Acknowledgement. Thanks to Kelly Kern for his help in preparing medical 
informatics part on and to Raj Kannan for his support in the presentation of sensor 
technology material. Julius Stuller’s input on web technology was highly useful 



References 

1. Ee-Peng, L., Srivastava, J„ Prabhakar, S., Richardson J.: Entity Identification in Database 
Integration. Information Sciences, Vol. 89. No. 1, (1996) 1-38 

2. Goeble, C.A., Glowinski, A., Crowther P.: Semantic Constraints in a Medical Information 
System 

3. Jin, L., Miyazawa, T.: MRM Server: A Context-aware and Location-based Mobile E- 
Commerce Server. In: Proceeding of Workshop WMC '02 

4. Varshney, U.: Location Management Support for Mobile Commerce Applications. In: 
Proceeding of Workshop WMC '02 

5. Munson, J.P.. Gupta, V.K.: Location-Based Notification as a General-Purpose Service. In: 
Proceeding of Workshop WMC '02 

6. Jung , I.-D., You, Y.-H., Lee, J.-H., Kim, K.: Broadcasting and Caching Policies for 
Location-Dependent Queries in Urban Areas 

7. Sivashanmugam, K.. Verma, K.. Mulye, R.. Zhong, Z.: Speed-R: Semantic P2P 
Environment for Diverse web service Registries 

8. Chakraborty, D., Perich, F., Avancha, S., Joshi, A.: DReggie: Semantic Service Discoveiy 
for M-Commerce Applications 

9. Pilioura, T., Tsalgatidou, A., Hadjiefthymiades, S.: Scenarios of using Web Services in M- 
Commerce. ACM SIGecom Exchanges, Vol. 3, No. 4, January 2003, 28-36 

10. Bernstein. P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in 
Database Systems. Addision Wesley (1987) 

11. Kumar, V., Son, S.H.:Database Recovery. Kluwer International 

12. Kumar, V., Hsu, M.: Recovery in Mechanisms in Database Systems. Prentice Hall, (1998) 




40 



V. Kumar 



13. Kumar, V., Prabhu, N„ Dunham, M., Seydirn, Y.A.: TCOT - A Timeout-based Mobile 
Transaction Commitment Protocol. Special issue of IEEE Transaction on Computers, 
Vol. 51, No. 10, Oct. 2002. 1212-1218 

14. Kotz, D.. Gray, R., Nog, S., Rus, D., Chawla, S., Cybenko, G.: AgentTCL: Targeting the 
needs of Mobile Computers. IEEE Internet Computing, Vol. 1, No. 4 (1997) 

15. www.dpve.iimas.unam.mx/mocap/MocapSvstem.html 

16. Dunham, M.H., Kumar, V.: Location Dependent Data and its Management in Mobile 
Databases. Proc. of the Ninth Workshop of Database and Expert Systems Applications 
DEXA'98, Vienna, Austria, August 26-28, 1998 

17. Dunham, M„ Kumar, V.: Impact of Mobility on Transaction Management. Int. Workshop 
on Data Engineering for Wireless and Mobile Access (MobiDE99), Seattle, Washington, 
August 20, 1999 

18. http://64.38.123.238/flash/professional/proiects/glove.htm 

19. Lindsey, S., Raghvendra, C., Sivalingam, K.: Data Gathering Algorithms in Sensor 
Networks Using Energy Metrics. IEEE Transactions on Parrallel and Distributed Systems, 
Vol. 13. No. 9, September 2002 

20. Ren, Q., Dunham, M., Kumar, V.: Semantic Caching and Query Processing. IEEE 
Transactions on Knowledge and Data Engineering, to appear 

21. Samtani, S., Mohania, M., Kumar, V., Kambayashi, Y.: Recent Advances and Research 
Problems in Data Warehousing Inernational. Workshop on Data Warehousing and Data 
Mining (ER '98), November 1998, 81-92 

22. Samtani, S., Kumar, V., Mohania, M.: Self Maintenance of Multiple Views in Data 
Warehousing. 8th International Conference on Information and Knowledge Management 
(CIKM '99), Kansas City, MO, November 2-6, 1999, 292-299 

23. Seyedim, A., Dunham, M., Kumar, V.:An Architecture for Location Dependent Query 
Processing. 4th International Workshop on Mobility in Databases and Distributed Systems, 
Technical University of Munich, Germany, September 3-7, 2001 

24. Seydim, Y., Dunham, M., Kumar, V.: Location Dependent Query Processing. 2nd ACM 
Int. Workshop on Data Engineering for Wireless and Mobile Access (MOBIDEOl), Santa 
Barbara, pp: 47-53, May 20, 2001 




Flexibility through Multiagent Systems 
Solution or Illusion? 



Peter C. Lockemann and Jens Nimis 



Fakultat fur Informatik, Universitat Karlsruhe 
Postfach 6980, 76128 Karlsruhe, Germany 
{lockeman, nimis } @ipd. uka . de 



Abstract. Multiagent software systems are known to exhibit a system-level 
behavior that rarely can be predicted from the description of individual agents 
but must be observed in simulation or real-life. On the other hand there are 
indications that agent technology is superior in situations that are non- 
deterministic or so ill-structured as to appear non-deterministic. This paper 
examines whether one can give a more precise characterization of those 
situations where multiagent systems hold great promise, and to test the 
corresponding hypothesis by simulating a real-world production scenario. After 
refining the hypothesis the paper examines whether one can guarantee the 
reliability of the agents in the presence of disturbances, because otherwise the 
flexibility of the multiagent system could become uncontrollable. After 
examining various transactional approaches that all pose major challenges the 
paper concludes that to go beyond an illusion still requires intensive research. 



1 The Need for Multiagent Software Systems 

1.1 A Minimalist View of Agents 

Multiagent systems are an extremely active area of computer science research [1], 
What, then, is a multiagent system? Or, for that matter, an agent? Surprisingly, there 
is no universally accepted definition of the notion of “agent”, rather there is still a 
good deal of ongoing debate on this subject [2]. One begins to wonder: How can there 
be so much activity, such high expectations in a technology if there is not even 
agreement on its basic concepts? Or from an application view: How can we judge the 
practical benefits of this technology if we do not clearly understand it? 

The reason for the debate may simply be that agents are too many things to too 
many people. On the one hand take engineers. For example, some ten years ago 
pieces of mobile but otherwise ordinary program code that was called agents was used 
to overcome the interruptions in telecommunications networks or to avoid large 
transmission volumes, and were sent to other nodes to perform computations [3], [4], 
Today the idea lives on in Java applets or in peer-to-peer mobile systems. More 
ambitiously, in today’s global networks where the nodes are expected to provide 
useful services the nodes must necessarily act with a certain degree of autonomy. 
Consequently, any one node has only limited influence over how another node 



P. Van Emde Boas et at. (Eds.): SOFSEM 2004, LNCS 2932, pp. 41-56, 2004. 
© Springer-Verlag Berlin Heidelberg 2004 




42 



P.C. Lockemann and J. Nimis 



responds to its requests so that some people refer to the systems in the nodes as agents 
even though these systems are just ordinary programs with a deterministic, externally 
specified behavior. 

At the other end of the spectrum, take distributed artificial intelligence where 
agents are often adorned with almost human-like capabilities. They are autonomous in 
their decisions, reactive and proactive in line with the goals they pursue, they can 
learn from the past, reason, and communicate with others, and they may even have 
social ability. Such agents may replace humans under certain circumstances, e.g., 
when too many factors influence a decision that has to be taken in split seconds but 
the decision itself requires a certain degree of “intelligence”, or when the human 
person cannot be physically present. 

Engineers dislike independent and unpredictable system behavior whereas AI 
scientists view such behavior as one of the advantages of agent technology. But 
wouldn’t even practical systems have something to gain from a more independent 
behavior? Take the success of agents that predict stock quotations even though they 
have few of the AI capabilities. And indeed, there seems to be agreement on the basic 
essentials. Agents are autonomous, computational entities that are situated in some 
environment and to some extent have control over their behavior so that they can act 
without the intervention of humans and other systems. They are intelligent in the 
sense that they operate flexibly and rationally in a variety of environmental situations, 
given the information they have and their perceptual and effectual capabilities (see 
Prolog to [1] and also [4]). 



1.2 Multiagent Systems 

To be intelligent - even in the limited sense of above - requires specialization. To 
observe a complex environment in its entirety in order to reach a given goal or to 
execute a given task requires a set of specializations. Hence, an agent must have its 
own specialized competence, and several of them need to collaborate. As 
a consequence, as another essential agents are interacting. Hence, they may be 
affected by other agents in pursuing their goals and executing their tasks, either 
indirectly by observing one another through the environment or directly through 
a shared language. 

Systems of interacting agents are referred to as multiagent systems (MAS). Our 
interest is in MAS that can be engineered, i.e., systems that have no “soft” properties. 
Hence we define a multiagent system as a system of autonomous, interacting, 
specialized agents that act flexibly depending on the situation. 

Note that nothing in this definition says that multiagent systems must necessarily 
be distributed systems. Any system with decentralized control and asynchronous 
communication meets the test. 



1.3 Multiagent Software Systems 

We know that larger software is organized into modules. In modern software systems 
the modules have the properties of objects. Objects seem a natural foundation for 
agents. All one has to do to turn it into a multiagent systems is to write code for them 
that entails the minimal properties of autonomy, flexibility and interactivity. 




Flexibility through Multiagent Systems: Solution or Illusion? 



43 



But let’s be careful. Do we always need autonomy, flexibility and interactivity? 
After all, multiagent systems come at a price. The system-level behavior often cannot 
be predicted analytically from the description of individual agents but must be 
observed in simulation or real-life. Consequently, the detailed behavior of an 
implemented system may not be known in advance [5], Hence, we should restrict 
multiagent software systems to situations where we are reasonably sure that we gain 
more than we pay for. There is agreement that MAS are at their best, i.e., fully play 
out their strengths in environments, e.g., the real-world processes to be supported, that 
are non-deterministic or so complex or ill-structured as to appear non- 
deterministic [2]. 

Now, what is a non-deterministic behavior? Or how non-deterministic should the 
environment become so that the price is worth the additional flexibility? It seems 
doubtful that one can come up with a metric to find an unambiguous answer, and even 
less one that would be general enough to apply to each and every environment. 
Instead, we will propose a methodical approach to find an answer for a given. We 
base the approach on a qualitative hypothesis [6] : 

Hypothesis: Multiagent systems offer an advantage if 

• the range of environmental situations (the problem space ) is too large to be 
enumerated and dealt with by conventional means, 

• the problem space can be divided into sets of simpler tasks, each requiring 
specialized competence, 

• the simpler tasks can be dealt with autonomously by individual agents, 

• the overall situation can only be solved by cooperation among the agents. 

Given such an approach we would have to perform a whole set of experiments on a 
wide spectrum of software applications to gain credible empirical evidence for or 
against the hypothesis. This clearly is beyond the means of a single group of 
researchers. Rather we pursue the more modest goal of developing a first approach for 
a real-world scenario of a production facility. 

Suppose we are able to show that MAS have their applications. The autonomy and 
flexibility of agents can only go so far before the common orientation gets lost and 
turns into chaos. Hence, flexibility itself must be in some sense reliable or predictable. 
Therefore, we examine in a second step how agents and agent communication can be 
made robust in the presence of technical disturbances. 



2 Testing the Need for Flexibility 

2.1 Scenario 

We choose a shop floor scenario, in this case for the assembly of circuit breakers [7]. 
Shop floors are of particular interest for our purpose because on the one hand the 
incoming orders are the result of centralized production planning and control, on the 
other hand it is the shop floor that has to cope with short-term disturbances like 
machine failures or priority jobs. Moreover, the shop floor is made up of several 
larger sections (Figure 1), along which we consider the Unit Assembly Area in more 





P.C. Lockemann and J. Nimis 



detail. It consists of 13 assembly lines where 6 different component families and 
4 subcomponent families are assembled. 



Product Assemblyl 



Fig. 1. Shop floor layout 



This is a scenario that comes close to the conditions under which we believe MAS 
offer benefits: There is a good number of stations which offer the potential for local, 
decentralized decisions, and the overall situation space has a good starting size. 
However, to test our hypothesis it should be much larger. Of particular interest to 
production practice is an increase in the space by subjecting each assembly station to 
machine disturbances or - particularly feared - depletion of stock. 



2.2 Experiments 

To test the hypothesis we have to devise two sets of experiments. For one we have to 
prove that in a large situation space a multi-agent systems (MAS) offers some 
advantages. More precisely, we examine whether MAS demonstrate improved 
performance in the presence of machine disturbances. Second, we have to show that 
in smaller situation spaces no benefits accrue from MAS. Here, we examine the 
behavior of MAS in the absence of disturbances. In fact, both experiments can be 
rolled into one by varying the level of disturbances. 

What is the basis for comparison (the benchmark)? We know that classical 
centralized planning (production planning and control, PPC) allows long-term 
optimization of production schedules provided the possibility of disturbances can be 






Flexibility through Multiagent Systems: Solution or Illusion? 



45 



excluded. The latter, if they occur, are left to the shop floor to deal with them. 
Specifically, we choose “Job-shop”, a mixed-model sequencing problem line- 
balancing algorithm with a longer-term horizon. 

We also have to settle for a suitable MAS architecture. Unfortunately, there is 
a large spectrum of possible architectures, and it is not known how different 
architectures influence the outcome of the experiments. We choose an architecture 
that seems a natural counterpart of the production scenario. We distinguish two kinds 
of agents, machine agents representing a production facility, and order agents. For the 
protocol governing the interaction within the MAS we investigated two algorithms 
that differed in their planning horizons. It turned out that the more potent one was an 
exclusively reactive algorithm in which a machine agent asks for new orders as soon 
as it finished the current order (Figure 2). The algorithm, therefore, has no planning 
horizon whatsoever. Communication among the agents follows a protocol similar to 
the ContractNet protocol [8], 

We note that the experiment could have considerable practical value. Suppose that 
assembly follows a Kanban system, that is, a pull-oriented production system that 
follows decentralized, self-controlling control cycles with the main goal of 
minimizing of the internal buffer stock. Then if MAS show superior results in the 
presence of disturbances, the local control cycles - and hence the Kanban system 
itself - could be made more sturdy by realizing them via MAS. 

Since for obvious reasons nobody can interrupt production to experiment with 
various control schemes, the experiments had to be done by simulation. Details can be 
found in [7], 



2.3 Benchmarking Results 

To verify or falsify the hypothesis we have to determine the parameters that are to be 
varied over the course of the experiments. These were the master data with the bill of 
materials and the operation list for each product, the number of machines and their 
assignment to operations, the list of customer orders, the lot size which affects the 
number of partial orders generated from a single customer order, the production order 
generated by the PPC, and the disturbance profile (disruption interval and duration) 
for each machine. By varying the master data, the list of customer orders, the lot sizes 
and the disturbance profiles we can influence the size of the situation space. 

Output variables for benchmarking that express the salient features of a production 
facility is the throughput. Both the average and the standard deviation are determined, 
the latter because it shows whether large variations occur - something that we would 
hope MAS would be able to reduce. 

The more than 1000 simulation runs proceeded as follows. Among the input 
parameters varying combinations of two parameters were fixed and the others were 
varied. For each combination the results were compressed into two-dimensional area 
diagrams that indicate the number of scenarios where MAS perform better, where the 
Job-shop algorithm performs better, and where results are indifferent. For a detailed 
description and discussion see [7], 

We summarize the results of the experiments. One would expect that with little 
complexity of the planning task MAS are inferior to the Job-shop algorithm, 
something that was indeed borne out by the experiments. However, the first big 




46 



P.C. Lockemann and J. Nimis 




Fig. 2. AUML diagram of a reactive MAS approach solving a Mixed-model Assembly Line 
Balancing Problem 

surprise was that even when raising the complexity of the planning task by increasing 
the number of assembly operations for a product and by raising the level of 
disturbances, MAS still remained inferior to the Job-shop algorithm. At least this was 
true for the average, whereas MAS indeed reduced the standard deviation, i.e., MAS 
produced almost constant throughput times. Closer inspection revealed that MAS had 
little chance to play out their strengths because the assembly lines ran close to 
capacity. Apparently, MAS need some slack to have a positive effect. The next 
surprise came when additional production facilities were introduced to provide some 
slack, but MAS became barely better by comparison. The reason now found was that 
all machines followed the same disturbance profile. Only after introducing large 
variations in the profile did MAS become significantly superior. And only then did 
increases in the other factors that influence the complexity of the planning task 
demonstrate the benefits of MAS as well, although to varying degrees. 

To quote from [7]: “Two key features [..] explain the superiority of MAS in 
a turbulent production environment [..]. First of all, MAS have the ability to follow 
good results. Due to the short planning horizon, the machine agents are able to 
consider time-dependent planning variables for their ratings, which leads to more 
precise results. On the other hand, the waiting queues of the lines are handled more 
efficiently when disturbances occur. [..] the medium throughput times and its standard 
deviations are smaller. The second factor is important with respect to the 
predictability of the results.” 





Flexibility through Multiagent Systems: Solution or Illusion? 



47 



2.4 Refining the Hypothesis 

The lesson drawn from the experiments is that a large situation space is a necessary 
but not sufficient condition for the utility of MAS. It seems that if there is no decision 
space commensurate with the situation space there is too little discriminative power 
for a MAS to become effective. Hence we refine the first part of our hypothesis: 

Hypothesis: Multiagent systems offer an advantage if 

• the range of environmental situations (the problem space) is too large to be 
enumerated and dealt with by conventional means, 

• the range of decisions (the solution space) for responding is commensurate in 
size with the problem space, 

• the problem space can be divided into sets of simpler tasks, each requiring 
specialized competence, 

• the simpler tasks can be dealt with autonomously by individual agents, 

• the overall situation can only be solved by cooperation among the agents. 

Note that the experiment used the last three parts of the hypothesis as a premise rather 
than testing it. Note also that the hypothesis is fairly abstract. It says little about what 
the problem and solutions spaces are in a specific application scenario, nor what 
would be large and commensurate sizes. Consequently, the hypothesis allows little 
more than a first estimate on whether to consider a MAS as a reasonable alternative. 
As this chapter has shown, a closer inspection by means of a simulation model should 
precede the commitment to a MAS solution, at least for bigger, more costly systems. 



3 Making Flexibility Reliable 

3.1 Behavioral Abstraction 

Our next objective is to make autonomy and flexibility itself reliable in the sense that 
it never goes out of the given bounds. Even if implemented according to specification 
agents may be subjected to technical disturbances that may cause them to enter 
forbidden behavioral territory. Therefore, we examine how agents can be made robust 
in the presence of technical disturbances. 

But even if all agents in a MAS can be made robust the MAS as a whole may still 
misbehave if the communication between agents is disrupted or corrupted. If we 
assume that the collaboration between agents follows a script or protocol that reflects 
a specific task - referred to as a conversation -, robustness must be extended beyond 
the individual agents to the conversation. 

Probably the best-known behavioral abstraction that includes robustness is the 
transaction. Autonomous and situated behavior is unthinkable without keeping an 
internal model of the environment and perhaps of the past history, i.e., agents must 
maintain and update an internal state. Hence, the basis for any robustness is non- 
volatility of the internal state. Therefore, agents will carry their own database. For 
robustness purposes the behavioral abstraction is the database transaction [9] . In its 
purest form the transaction has the ACID properties: It is atomic, i.e., is executed 




48 



P.C. Lockemann and J. Nimis 



completely or not at all; it is consistent, i.e., if executed to completion it performs a 
legitimate state transition; it is isolated, i.e., remains unaffected by any parallel 
transactions; and it is durable, i.e., the state reached can only explicitly be changed. 
Transaction systems are mechanistic systems that guarantee these properties even in 
the presence of failures, and do so without any knowledge of the algorithms 
underlying the transactions. 

The behavioral abstraction for the conversation is the distributed transaction, i.e., a 
transaction that coordinates the transactions in the individual nodes in such a way that 
certain overall properties are guaranteed, also again in the presence of failures which 
may now include networks failures. ACID properties are difficult to maintain even 
under modest degrees of autonomy and hence are usually relaxed [9], Consequently, 
it seems that to make conversations robust even under strong autonomy of individual 
agents requires a type of distributed transaction where guarantees are weak and the 
agents may themselves have to take corrective action. 



3.2 Agents 

A possible solution for the agents is taken from [10], [11]. To obtain the necessary 
autonomy and flexibility in reacting to the situation at hand and taking decisions, an 
architecture based on the BDI theory is chosen [12], [13]. The architecture is 
organized into three layers. The lowest layer is responsible for the reactive behavior 
and communicates with the environment including the other agents. The middle layer 
does the planning and accounts for deliberations on how to respond to perceived 
events in the environment. The uppermost layer takes the widest context into account 
and plans the cooperation with other agents. Accordingly, the agent database is 
hierarchically structured into corresponding models: the world model, the mental 
model, and the cooperation model (Figure 3). 

If control flow were purely sequential, starting with sensor input on the lowest level 
and, if need be, progressing upwards to the deliberation layer or even to the 
cooperation layer, a flat ACID transaction would suffice. But flexibility demands 
more complicated dynamics. A decision may very well be taken locally on the 
reaction layer, resulting in some change to the database. The planning layer may 
recognize the change as affecting its own behavior, and may initiate a process local to 
this layer. The effect may continue into the cooperation layer. The upper layers may 
in turn initiate actions on the next lower levels. 

As a consequence, the robustness properties of the agents must be modeled as 
something more complicated than flat transactions. First, the behavioral abstraction 
should reflect the fact that an external event may spawn several actions that are on 
different layers but nonetheless interrelated. If we model each action as a transaction 
then their interrelationship is most naturally modeled as a nested transaction. Under a 
nested transaction model, a transaction can launch any number of subtransactions, 
thus forming a transaction tree (Figure 4). In its strictest form, a transaction cannot 
commit unless all its children are terminated. On the other hand, if one of the children 
fails, the parent has the choice between aborting the subtransaction, retrying it or 
launching another compensating subtransaction. However, no matter what happens 
each transaction in the tree is guaranteed to execute atomically. 




Flexibility through Multiagent Systems: Solution or Illusion? 



49 



Agent knowledge base 

Cooperative planning layer 
Local planning layer 
Behavior based layer 



Fig. 3. INTERRAP architecture 



SG: Situation recognition and Goal activation 
PS: Planning, Scheduling and execution 



-► information access 
-► control flow 





r 

□ 


action node 


o 


control node 




synchronization node 


$ 


parallel execution 


_ 


sequential execution 


± 




alternate execution 



Fig. 4. Nested agent transaction 



There are two kinds of nested transactions. In closed nested transactions 
a subtransaction leaves its changes to the database invisible to all but the parent and 
the siblings. This allows a committed subtransaction to be invalidated if its parent 
fails. This is too restrictive a property considering the flexibility of spawning new 
transactions as we move up or down the layers. Hence we choose as the behavioral 
abstraction the open nested transaction model. In it a committed subtransaction makes 
its results available to all other subtransactions as soon as it commits. This leaves 
compensation as the only way to mitigate its effects if a parent fails. 

Agents are supposed to act flexibly. Clearly then, the fixed regime of a classical 
transaction would be counterproductive. Instead, the nested transactions should 
evolve dynamically as control progresses through the agent. Consequently, we 
employ a restricted form of nested transaction (Figure 4): Actions take exclusively 
place in the leaves, whereas intermediate nodes just control the execution within their 
subtrees whose subtransactions may execute sequentially, in parallel or alternatively. 















50 



P.C. Lockemann and J. Nimis 



We note that in the nested transaction model execution starts from the root on 
down. Consequently, we need a root transaction, and the only place for it is on the 
cooperation layer. On the other hand, the agent receives its input on the reactive 
behavior layer, that is, on the lowest layer. Therefore, to respond the agent must first 
identify the corresponding root transaction and, hence, the nested transaction itself. 
Suppose that as a result of some input the agent starts the left-hand transaction of 
Figure 5. As control progresses through the agent, the agent will augment the 
transaction tree. For example, after the transaction reaches the control node the agent 
augments it by a subtree to which transaction control is subsequently passed. This 
very mechanism may be used in case of failures, because the agent may simply 
modify the tree to include compensating subtransactions. In summary, the set of 
initial trees can be interpreted as a set of plans that define the possible behavior of an 
agent and determine its autonomy and flexibility coupled with robustness guarantees. 





Fig. 5. Evolution of a nested agent transaction 



3.3 Agent Cooperation 

An important part of our hypothesis is that agents must cooperate to solve an overall 
situation. Cooperation should follow some protocol. Execution of the protocol is itself 
subject to technical disturbances. We have experimented with several approaches to 
make cooperation robust. 



Agent synchronization. A seemingly simple approach starts from the premise that 
nested agent transactions should suffice to guarantee reliability, provided we can 
augment them such that they can also deal with failures during the communication 
with other agents. To do so they must be able to exert some control over the 
communication. Such a control regime is commonly referred to as synchronization. 
Figure 6 illustrates the principle. 

To enable such a controlled synchronization, the nested transaction for an 
individual agent is augmented by special synchronization nodes. Figure 4 shows such 
a node (indicated by the arrow). 

Figure 7 illustrates the synchronization for two separate transactions. M n does not 
only start subtransactions M m and M u , but also wakes up subtransaction S u . In turn, 
M n , cannot continue until S n has finished and S, has regained control. For a full 
compensation support, two further pairs of synchronization nodes are needed. Pair (3) 
prevents the termination of the slave transaction tree before the termination of the 









Flexibility through Multiagent Systems: Solution or Illusion? 



51 



master transaction tree and pair (4) causes the compensation of the slave subtree S, in 
the case M n fails. More common, however, will be simpler master-slave situations. 




Fig. 6. Synchronization of nested agent transactions 



In a physically centralized environment it makes sense to base the synchronization, 
like the nested transactions, on a common database. A well-known technique to deal 
with asynchronous events in databases are Event-Condition-Action rules (ECA rules). 
Database systems with this capability are known as active database systems [14]. This 
is a very powerful mechanism and allows for a broad variety of cooperation schemes 
over and above the task delegation in Figure 7. A serious drawback of the approach, 
though, lies in the need for handling the ECA rules explicitly in the agent behavior. In 
combination with the evolutionary character of the nested transactions including the 
frequent need for compensating subtractions the complexity of nested agent 
transactions may become unmanageably high. 



Transactional conversations. The difficulties of the previous approach suggest 
another approach where the cooperation (like the one of Figure 2) is concentrated 
within a separate protocol and is protected as an own - now distributed - transaction. 
We refer to such a protocol as a conversation, and since it is protected by a 
transaction, as a transactional conversation. Figure 8 illustrates the principle. 

Clearly, there are action nodes that participate in both the nested agent transaction and 
in the transactional conversation. While this may conceptually be in order, technically 
the transactions need to be separate. Their interrelation can only be via a common 
database structure. In order to gain a measure of control we restrict the (conceptual) 
intersection of the transactions to the leaf nodes. As a consequence, though, 
transactional conversation must be ACID. 

Figure 9 outlines our technical solution [15]. Conversations are handled 
transparently by wrapping the tasks that handle the protocol execution within the 
communicating agents by transactional tasks (ta-tasks) that observe this execution and 
the occurrence of local state changes. Consequently, conversations that represent 
instantiations of such protocols are mapped to distributed transactions. As a result, the 
changes in the states of the communicating agents now are synchronized via the 
success or failure of the overall distributed transaction. The transaction manager is 
integrated in the multiagent platform so that as usual, depending on the success or 
failure of the interrelated user-tasks, a global commit or rollback can be initiated. 





52 



P.C. Lockemann and J. Nimis 




Fig. 7. Two synchronized agent transactions 




There is also a drawback to this approach. ACID imposes a rigidity that stands in 
drastic contrast to the desired flexibility of MAS as it manifests itself in the 
evolutionary nested transactions. On the other hand, it does not seem advisable to do 
away with the ACID properties for conversations in order for the system to remain 
manageable. In addition it seems a bit unnatural to place cooperation in the leaf nodes 
rather than in the root nodes. 

Messaging. The stand high-level solution for asynchronous communication are 
message queuing systems. There is a growing tendency for making these queues 
persistent and base them on database technology [16]. Persistent message queues (or 







Flexibility through Multiagent Systems: Solution or Illusion? 



53 



message-oriented middleware, MOM, as it is often referred to) give certain guarantees 
regarding safe transmission, and one may request further guarantees such as 
notification, reproducibility of message exchange, and non-repudiation. Based on 
these properties one may develop cooperation protocols that may even evolve 
dynamically. 




Fig. 9. Transactional conversations 

We have done first experiments with this third approach. Conversations, and hence 
protocols, are composed of so-called speech acts for which a formal semantics exists. 
For each message a speech act keyword, like e.g. “confirm” or “propose”, defines the 
impact of the message content to the states of the sender and the receiver [ 17]. The 
meaning of the keywords is presented by a pair of precondition and postcondition, 
both specified in a first order modal logic [18]. In our current work, we investigate 
what impact a commit and a rollback according to the speech act semantics should 
have and especially how to exploit the speech act preconditions for the necessary 
compensation actions in case of limited isolation. 

The local portions of the protocol can easily be integrated with the nested 
transaction trees. Each portion consists of speech acts that can be arranged in tree 
form. Evolving transaction trees may thus be augmented by the protocol trees which 
then can be subjected to the local transaction regime. Hence, each single message 
within the conversation rather than the whole conversation is now considered atomic. 
Transaction trees now include the control flow within the protocols and between 
different protocols, while the submission of a single message is treated as an action 
with ACID properties and, consequently, must be mapped to a leaf node. 







54 



P.C. Lockemann and J. Nimis 



4 Conclusion 

Are multiagent systems a panacea, a cure-all for all problems that require flexible 
software? As discussed in Section 1, multiagent systems exact a heavy price in that 
system-level behavior often cannot be predicted analytically from the description of 
individual agents but must be observed in simulation or real-life. Hence, one should 
be extremely restrictive when it comes to deciding on the application of a multiagent 
software system. We started from a hypothesis on the qualitative characteristics that 
a situation should exhibit in order to treat multiagent software as a candidate solution. 
We reported on a simulation study for a production scenario which confirmed but also 
refined the hypothesis. Basically, MAS can be recommended for large problem and 
solution spaces, where the problem space is non-deterministic or so ill-structured as to 
appear non-deterministic. As members of a national research initiative on multiagent 
systems we observe that one particular area where these characteristics seem to 
predominate is health care. 

We claimed that even these benefits depend on agents that have industrial- strength 
properties. But on closer scrutiny doubts arise. Industrial-strength is based on 
transactional properties. These require an extensive and expensive infrastructure 
consisting of both, a database manager and a transaction manager. Agents should 
therefore be placed on heavy-weight nodes or should all have access to the same 
centralized server structure. Moreover, the transaction managers must support 
distributed transactions, and the database managers must include active mechanisms 
such as triggers. In order not to complicate the task one would at least try to avoid 
interoperability problems and keep the distributed infrastructure homogeneous. For 
example, one should require all infrastructure to be FIPA compliant [19]. 

The nested transaction model raises additional problems of a semantic nature. The 
set of initial transaction trees is related to the plans of an agent, and the evolution of 
a tree to the reactive behavior of an agent. Consequently, as opposed to traditional 
database transactions, agent transactions cannot be imposed orthogonally but become 
part of the agent design proper. Further, it proved extremely complex to extend the 
reliability to the cooperation between agents. Either the transaction trees have to be 
made even more complicated by including synchronization or messaging mechanisms 
in the tree, or the cooperation protocols are concentrated in transactional 
conversations that impose a certain rigidity through their ACID properties. 
Robustness of agents still seems to pose considerable research challenges. 

To separate the issues of normal agent behavior and agent robustness we have 
experimented with splitting the agent into two - albeit interrelated - parts, a regular 
domain agent responsible for, among others, planning of the transactions, and a 
transaction agent (Figure 10). 

Another challenge has to do with the details of a transaction tree. It should be possible 
to generate it from a more abstract description by the software engineer of the 
behavior, desired autonomy and flexibility of an agent, somewhat similar to the 
descriptive query languages for database access that just describe the properties of the 
desired result. Indeed, if according to the hypothesis the problem space is too large to 
be enumerated and may not even be completely known at system development time, 
an abstract characterization is the only viable alternative. Examples are predicative 
expressions or sets of production rules. We refer to such characterizations as 




Flexibility through Multiagent Systems: Solution or Illusion? 



55 



descriptive characterizations. Since we expect decision spaces of a similar size, these 
will also have to be characterized descriptively. 





Fig. 10. Semi-orthogonal organization of agent software 



Descriptive characterizations of agents is nothing new to the MAS community. 
Indeed, agents in artificial intelligence follow fairly complicated descriptions, often 
on the basis of complex architectures such as the BDI (belief-desire-intention) 
architecture. But since we pursue the more modest goal of making software systems 
more flexible, we should insist on simpler description techniques that are acceptable 
to the practitioner, for example because it relates to the application domain. In our 
shop floor scenario priority rule-based machine scheduling methods assign a certain 
value to the different production tasks that compete for the same machine (or other 
resource). This value is directly derived from characteristic value ratios and forms the 
basis for the construction of the execution sequence of the tasks. Already in the mid- 
seventies more than one hundred priority rules were systemized and described in 
literature [20]. An open research issue for agent technology would then be the 
mapping from the global priority rules to the local agent behavior. 






56 



P.C. Lockemann and J. Nimis 



Acknowledgement. Part of this research was funded by Deutsche For- 
chungsgemeinschaft contract no. Lo 296/17-2 within the National Research 
Initiative 1083. 



References 

1. Weiss, G. (ed): Multiagent Systems - A Modern Approach to Distributed Artificial 
Intelligence. MIT Press (1999) 

2. Wooldridge, M.: Intelligent Agents. In [1], 27-7 

3. White, J.E.: Telescript Technology: The Foundation for the Electronic Marketplace. 
General Magic White Paper (1994) 

4. Lange, D. B., Oshima, M.: Agents in E-commerce - Seven Good Reasons for Mobile 
Agents. Communications of the ACM 42:3, ACM (1999) 88-91 

5. Pamnak, H.V.D.: Industrial and Practical Applications of DAI. In [1], 377-421 

6. Durfee, E.H.: Distributed Problem Solving and Planning. In [1], 121-164 

7. Frey, D., Nimis, J., Worn, H., Lockemann, P.C.: Benchmarking and Robust Multi-Agent 
Based Production Planning and Control. Engineering Appl. of Artificial Intelligence 16 4 
Elsevier (2003 ) 307-320 

8. Smith, R.G.: The Contract Net Protocol: High-Level Communication and Control in 
a Distributed Problem Solver. IEEE Transactions on Computers 29 12 IEEE (1980) 1 104- 
1113 

9. Weikum. G., Vossen, G.: Transactional Information Systems. Morgan Kaufmann (2002) 

10. Nagi, K.. Lockemann, P.: Implementation Model for Agents with Layered Architectures in 
a Transactional Database Environment. Proc. I s1 Int. Bi-Conf. Workshop on Agent- 
Oriented Information Systems (AOIS'99). (1999) 

11. Nagi, K.: Transactional Agents - Towards a Robust Multi-Agent System. Lect. Notes in 
Computer Science, Vol. 2249. Springer (2001) 

12. Bratman, M.: Intentions, Plans and Practical Reason. Harvard University Press (1987) 

13. Mueller, J.: The Design of Intelligent Agents: A Layered Approach. Lect. Notes on 
Artificial Intelligence, Vol. 1177. Springer (1996) 

14. Widom, J., Ceri, S. (eds.): Active Database Systems. Morgan Kaufmann (1996) 

15. Vogt, R.: Einbettung eines transaktionsgestiitzten Robusheitsdienstes in das FIPA- 
Agentenrahmenwerk. Diploma thesis at Universitat Karlsruhe (TH). Karlsruhe (2001) 

16. Lockemann. P.C., Dittrich, K.R.: Architecture of Database Systems, dpunkt.verlag (2004) 
(in German) 

17. Foundation for Intelligent Physical Agents: FIPA Communicative Act Library 
Specification. (2000) 

18. Sadek, M. D.: Attitudes Mentales et Interaction Rationnelle: Vers une Theorie Formelle de 
la Communication. These de Doctorat Informatique, Universite de Rennes I, France. 
(1991) 

19. Foundation for Intelligent Physical Agents: Specification Repository. Available at 
http://www.fipa.org/repository/index.html ( 1 1 -05-2003) 

20. Panwalkar, S. S., Iskander, W.: A Survey of Scheduling Rules. Operations Research 
25,1 (1977) 




World Wide Web Challenges: 
Supporting Users in Search and Navigation 



Natasa Milic-Frayling 

Microsoft Research Ltd, Roger Needham Building, 7 J J Thomson Avenue 
Cambridge CB3 OFB, United Kingdom 
natasamf @microsof t . com 



Abstract. World Wide Web poses many challenges for designing effective 
information services and applications. Addressing these challenges requires 
good understanding of Web characteristics and their implications. In this 
presentation we focus on selected aspects of the Web that affect searching and 
navigation by the users. We offer our insights into the issues and present our 
preferred strategies for addressing them. We demonstrate prototype systems 
that illustrate our solutions and discuss system evaluations that we have 
conducted. 



1 Introduction 

World Wide Web is a highly distributed and dynamic information environment. Users 
access information through a variety of devices, performing a wide spectrum of user 
tasks. This poses many challenges for designing effective information services and 
applications. Addressing these challenges requires a good understanding of Web 
characteristics and their implications on the design and usability of services and 
applications. 

Here we focus on three aspects of the Web that impact the user’s experience during 
search and navigation [1]: (a) separation of search and document delivery, 
(b) separation of document authoring and creation of metadata that is required by 
services and applications, and (c) lack of a generic publishing format that supports 
flexible viewing of the Web content on a variety of devices. We briefly state the 
problems and describe our recommendations and prototypes that illustrate the 
possible solutions. 



2 Separation of Search and Document Delivery 

With regards to the Web information seeking, it is important to note that processing of 
users’ requests for information and delivery of documents in response to these 
requests are typically two disjoint processes. Indeed, services like Web search and 
online directories typically provide the user with URLs of information sources rather 
than the documents themselves. Documents are hosted on Web site servers, out of 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 57-59, 2004. 
© Springer-Verlag Berlin Heidelberg 2004 




58 



N. Milic-Frayling 



control of the service itself. This separation places a particular importance on the 
design of the user interface for accessing the Web, the Web browser. 

Indeed, the browser is the first layer interface , hosted on the client side, which 
exposes the second layer interface, the Web pages of the service, e.g., a search page, 
that is designed and managed by the service team. In the search scenario, for example, 
the user enters the query into the search box and sends it for processing to a search 
engine. The browser then displays the search results and delivers the document for the 
URL selected by the user. Thus, the browser is involved in all essential stages and 
should provide bridges between them by creating and maintaining a rich context of 
the user’s activities. 

As an illustration of this recommendation we designed and implemented 
a prototype system, MS Read [2] that captures and applies the search context. It 
captures the user’s queries, creates a representation of the search topic through 
linguistic analysis of the query or subsequent refinement by the user, and provides 
visual feedback on the matches between the topic and the delivered documents 
through term highlighting. The user’s topic is also used to analyze the text of the 
linked documents for the currently viewed page. Essentially, the browser has been 
extended with natural language processing capabilities and indexing of viewed or 
linked pages. These added facilities provide foundations for rich client side 
processing. With visual features, such as thumbnails of full pages and term highlights, 
the enhanced browser provides more effective support for searching and browsing. 



3 Separation of Document Authoring and Metadata Creation 

Web services that involve a representation of the entire or a large portion of the Web 
often resort to Web crawling and centralized processing of the collected content. They 
create metadata, such as searchable indices, that are then used to facilitate the service. 
However, as the authors continue to update their Web sites, this metadata quickly 
becomes out of sync with the source documents. Thus, centralized Web services 
essentially work with the data representations that are out of date. 

Furthermore, because of the sheer scale of the Web and the need for fast and 
frequent crawling, Web services can apply only relatively simplistic content analyses. 
As a consequence, information they provide to the users about the content and 
structure of the Web sites is typically suboptimal. 

We thus promote the idea of generating information about the Web site and page 
structures at the authoring or publishing stage, and providing that information to 
applications and services upon request. This will potentially eliminate the need for 
collecting the Web documents (they are not delivered to the user except when cached 
pages are made available). The services would crawl the Web for metadata instead. 

An immediate benefit of this distributed metadata creation model is the ability to 
generate and supply more sophisticated content and structure analyses. Furthermore, 
if a mechanism is provided for sites to ‘push’ their indices onto the services right 
upon authoring or updating, this helps with the issue of outdated indices. 

In support to this idea we present the MIDAS framework [3] (Meta-Information 
Delivery and Annotation Services) which we implemented to enable creation, 
distribution, and utilization of metadata about the structure and content of Web sites 
and pages. We illustrate how MIDAS can be used to enhance the user’s experience 




World Wide Web Challenges: Supporting Users in Search and Navigation 59 



during search and navigation and highlight the way it complements the Semantic Web 
effort. 



4 Flexible Viewing of Web Content across Devices 

Use of Web on mobile devices, e.g., Personal Digital Assistants (PDAs) and Internet 
enabled mobile phones, poses further challenges. Viewing Web pages of complex 
layout structure is incompatible with the restricted screen space. Such pages assume a 
fix standard size of the browser display and are not automatically modified to fit the 
size of devices with small screens. As a result, viewing such pages requires extensive 
horizontal and vertical scrolling. This can lead to disorientation while reading and 
cause difficulty with identifying relevant parts of the page during search. 

We describe and demonstrate SmartView [4] and SearchMobil [5], two 
applications that support browsing, reading, and searching of Web documents on 
small devices. They involve analyses of HTML pages and their decomposition into 
logical units that could be selected for individual viewing. The user can choose to 
view a graphical overview of the page in the form of a thumbnail image, indicating 
the partition into page segments. As the user selects a segment for viewing, the 
corresponding content is automatically reformatted to support reading with no 
horizontal scrolling. In the search context, SearchMobil provides highlighting of hits 
and assessment of individual page segments for their relevance to the user’s topic. 

We discuss the usability studies performed with the implemented systems [6] and 
discuss how mobile device applications in general could be effectively supported by 
the MIDAS framework. 



References 

1. Milic-Frayling, N., Sommerer, R.: Enhanced Web Publishing: Towards Integration of 
Search and Navigation. In the Proceedings of Libraries in the Digital Age (LIDA) 
Conference (2003) 

2. Milic-Frayling, N.. Sommerer, R.: MS Read: Context Sensitive Document Analysis in the 
WWW Environment, Microsoft Technical Report, MSR-TR-2001-63 (2001) 

3. Milic-Frayling, N.. Sommerer, R., Smyth, G.: MIDAS: Towards Rich Site Structure and 
Content Metadata. Poster, On-line Proceedings of the 12the World Wide Web Conference, 
Budapest (2003) 

4. Milic-Frayling, N.. Sommerer, R.: SmartView: Flexible Viewing of Web Page Contents. 
Poster. On-line Proceedings of the 1 1th World Wide Web Conference (2002) 

5. Milic-Frayling, N., Sommerer, R., Rodden, K., Blackwell, A.: SearchMobil: Web Viewing 
and Search for Mobile Devices. Poster, On-line Proceedings of the 12 the World Wide Web 
Conference, Budapest (2003) 

6. Rodden, K., Milic-Frayling, N., Sommerer, R., Blackwell, A.: Effective Web Searching on 
Mobile Devices. In the Proceedings of HCI 2003, Bath, United Kingdom (2003). 




Querying and Viewing the Semantic Web 
An RDF-Based Perspective 



Dimitris Plexousakis 



Department of Computer Science, University of Crete, and 
Institute of Computer Science, Foundation for Research and Technology Hellas 
dp@ics . forth . gr 



Abstract. Real-scale Semantic Web applications, such as Knowledge Portals 
and E-Marketplaces, require the management of voluminous repositories of 
resource metadata. At the same time, personalized access and content 
syndication involving diverse conceptual representations of information 
resources are key challenges for such applications. The Resource Description 
Framework (RDF) enables the creation and exchange of metadata as any other 
Web data and constitutes nowadays the core language for creating and 
exchanging resource descriptions worldwide. Although large volumes of RDF 
descriptions are already appearing, sufficiently expressive declarative query 
languages for RDF and full-fledged view definition languages are still missing. 



We present RQL, a new query language adopting the functionality of semistructured 
or XML query languages to the peculiarities of RDF, but also extending this 
functionality in order to uniformly query both RDF descriptions and schemas. RQL is 
a typed language, following a functional approach and relies on a formal graph model 
that permits the interpretation of superimposed resource descriptions created using 
one or more schemas. We illustrate the syntax, semantics and type systems of RQL 
and report on the performance of RSSDB, our persistent RDF Store, for storing and 
querying voluminous RDF metadata. RQL and RSSDB are part of RDFSuite, a set of 
scalable tools developed at ICS FORTH for managing voluminous RDF/S description 
bases and schemas. 

We also propose RVL, a view definition language capable of creating not only 
virtual resource descriptions, but also virtual RDF/S schemas from (meta-) classes, 
properties, as well as, resource descriptions available on the Semantic Web. RVL 
exploits the functional nature and type system of the RQL query language in order to 
navigate, filter and restructure complex RDF/S schema and resource description 
graphs. Last, but not least, we address the problem of integrating legacy data sources 
using SWIM, a Datalog-based framework for mediating high-level queries to 
relational and/or XML sources using ontologies expressed in RDF/S. 



P. Van Emde Boas et at. (Eds.): SOFSEM 2004, LNCS 2932, pp. 60-61, 2004. 
© Springer-Verlag Berlin Heidelberg 2004 




Querying and Viewing the Semantic Web: An RDF-Based Perspective 



61 



References 



1. Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: On Storing 
Voluminous RDF Descriptions: The case of Web Portal Catalogs. In: Proceedings of the 
4 th International Workshop on the Web and Databases (WebDB), Santa Barbara, CA. (2001) 

2. Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D., Tolle, K.: The ICS- 
FORTH RDF Suite: Managing Voluminous RDF Description Bases. In: Proceedings of the 
2nd International Workshop on the Semantic Web, Hong-Kong, (May 2001) 1-13 

3. Karvounarakis, G., Alexaki, S., Christophides, V., Plexousakis, D., Scholl, M.: RQL: 
A Declarative Query Language for RDF. In: Proceedings of the 11th International 
Conference on the WWW, Hawaii (2002) 592-603 

4. Karvounarakis, G., Maganaraki, A., Alexaki, S., Christophides, V., Plexousakis, D., 
Scholl, M.. Tolle, K.: Querying the Semantic Web with RQL. In: the Computer Networks 
lournal 42 , (2003) 617-640 

5. Maganaraki, A., Tannen, V., Christophides, V., Plexousakis, D.: Viewing the Semantic Web 
through RVL Lenses. In: Proceedings of the 2nd International Conference on the Semantic 
Web^Sanibel Island, Florida, (October 2003) 96-1 12 

6. Tannen, V., Christophides, V., Karvounarakis, G., Koffina, I., Kokkinidis, G., 
Maganaraki, A., Plexousakis, D.. Serfiotis, G.: The ICS-FORTH SWIM: a Powerful 
Semantic Web Integration Middleware. In: Proceedings of the Workshop on the Semantic 
Web and Databases, VLDB Conference, Berlin, September 2003 




Knowledge Acquisition and Processing: 
New Methods for Neuro-Fuzzy Systems 



Danuta Rutkowska 

Department of Computer Engineering, Technical University of Czestochowa, 
Armii Krajowej 36, 42-200 Czestochowa, Poland, 

drutkoOkik . pcz . czest . pi, 

Department of Artificial Intelligence, WSHE University in Lodz, 
Rewolucji 1905, 52, 90-213 Lodz, Poland 



Abstract. The paper presents some new methods of knowledge acqui- 
sition and processing with regard to neuro-fuzzy systems. Various con- 
nectionist architectures that reflect fuzzy IF-THEN rules are considered. 
The so-called flexible neuro-fuzzy systems are described, as well as re- 
lational systems and probabilistic neural networks. Other connectionist 
systems, such hierarchical neuro-fuzzy systems, type 2 systems, and hy- 
brid rough-neuro-fuzzy systems are mentioned. Finally, the perception- 
based approach, which refers to computing with words and perceptions, 
is briefly outlined. Within this framework, a multi-stage classification 
algorithm and a multi-expert classifier are proposed. 



1 Introduction 

Various combinations of fuzzy systems and neural networks create neuro-fuzzy 
systems [65]. When fuzzy systems are represented in the form multi-layer ar- 
chitectures, similar to neural networks, we have connectionist neuro-fuzzy sys- 
tems [30]. Different architectures of such systems can be considered, and a general 
form that includes many special cases is presented in Section 3. 

When systems of this kind solve a problem, they perform according to fuzzy 
IF-THEN rules, which constitute a knowledge base. The knowledge acquisition 
realized by intelligent systems is very important from application point of view, 
and this ability is a feature of intelligence. 

This paper, as mentioned in the abstract, concerns various methods of know- 
ledge acquisition and processing in neuro-fuzzy systems. Connectionist architec- 
tures of the systems are considered. Apart from the general architecture, flexible 
neuro-fuzzy systems are described in Section 4, relational systems in Section 5, 
and probabilistic neural networks in Section 6. Other systems, such hierarchi- 
cal neuro-fuzzy systems, type 2 systems, and hybrid rough-neuro-fuzzy systems, 
are mentioned in Section 7. The perception-based approach, which incorporates 
the computing with words and perceptions, introduced by Zadelr [43], is briefly 
outlined in Section 8. A multi-stage classification algorithm and a multi-expert 
classifier are proposed with regard to this approach. 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 62-81, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Knowledge Acquisition and Processing 



63 



The methods proposed in this paper are soft computing methods [71], which 
can be used in computational intelligence (and artificial intelligence) [1] , in order 
to construct intelligent systems. They are related to cognitive technologies, since 
intelligent systems try to imitate the cognitive behaviour, like neural networks 
do, with inductive learning. 



2 Cognition and Neuro-Fuzzy Systems 

Knowledge acquisition and processing with regard to neuro-fuzzy systems can 
be viewed within the framework of cognitive technologies. Cognitive sciences 
concern thinking, perception, reasoning, creation of meaning, and other functions 
of a human mind. The word “cognition” comes from the latin word “cognitio”, 
which means “knowledge” . Rule-based systems are knowledge-based systems, 
where the knowledge is represented by the rules. Connectionist architectures of 
neuro-fuzzy systems reflect fuzzy IF-THEN rules, which are contained in the 
rule base. 

The aim of artificial intelligence is to develop paradigms or algorithms that 
allow machines to perform tasks that involve cognition when performed by hu- 
mans [55]. It is probably an axiom of artificial intelligence, and modern psycho- 
logy, that intelligent behavior is rule-governed [10]. One of the most significant 
results demonstrated by Newell and Simon was that much of human problem 
solving or cognition could be expressed by IF-THEN type production rules. The 
Newell and Simon model of human problem solving in terms of long-term me- 
mory (rules), short-term memory (working memory), and a cognitive processor 
(inference engine), is the basis of modern rule-based expert systems [9]. 

Machine learning research has the potential to make a profound contribution 
to the theory and practice of expert systems, as well as to other areas of artificial 
intelligence. Its application to the problem of deriving rule sets from examples 
is already helping to circumvent the knowledge acquisition bottleneck [10]. 

Learning by examples is one of the simplest cognitive capabilities of a young 
child. Artificial neural networks with an inductive, supervised learning algorithm, 
imitate the cognitive behaviour. The most common form of supervised learning 
task is called induction. An inductive learning program is one which is capable 
of learning from examples by a process of generalization [10] . 

Perception is very important in human cognition. The systems that incor- 
porate perceptions expressed by words are fuzzy systems, introduced by Zadeh 
[68], [69]. Fuzzy systems are rule-based systems (knowledge-based systems) that 
can be viewed as perception-based systems. The rule base of a fuzzy system is 
composed of fuzzy IF-THEN rules that are similar to rules used by humans in 
their reasoning. 

Symbolic models of reasoning, e.g. expert systems in AI, have nothing to do 
with neurobiology, and are not appropriate for pattern recognition, associations, 
and knowledge generalization. Artificial neural networks do not perform logical 
inference, and employ associative way of reasoning. 




64 



D. Rutkowska 



Hybrid systems, such as fuzzy and neural expert systems, as well as connec- 
tionist neuro-fuzzy systems are created as intelligent systems that posses fea- 
tures of rule-based reasoning and learning ability. Evolutionary algorithms [15] 
can also be incorporated into the hybrid intelligent systems. 

It seems obvious that artificial neural networks, which try to imitate networks 
of biological neurons in a human brain, and perception-based fuzzy systems, 
which perform reasoning based on fuzzy IF-THEN rules, should be combined 
to create main components of intelligent systems that try to imitate human 
intelligence. 

Various neuro-fuzzy systems can be constructed by combination of neural 
networks and fuzzy systems. The so-called connectionist networks are viewed 
as representations of fuzzy systems in the form of connectionist nets, which 
are similar to neural networks. The systems of this kind can automaticaly create 
fuzzy IF-THEN rules based on examples, such as elements of a learning sequence 
presented to neural networks. In this way, the neuro-fuzzy systems get knowledge, 
avoiding to formulate the rules by human experts. Architectures of the connectio- 
nist neuro-fuzzy systems reflect the rules. The connectionist architectures are 
multi-layer architectures, like neural networks. Thus, the neuro-fuzzy systems 
can be trained in the similar way as neural networks [73], [30]. 

The relation between neuro-fuzzy systems and cognitive technologies, ex- 
plained above, was a part of the plenary lecture, entitled ’’Cognition, percep- 
tion, and rule-based systems”, presented by the author during the conference 
on cognitive sciences and computational intelligence, in Stara Lesna, Slovakia, 
May 14-16, 2003. 

3 General Form of Fuzzy Inference Neural Network 

The dominant feature of fuzzy systems is an inference process, which is based 
on fuzzy logic [68], [69]. Two main approaches to the inference of fuzzy (or 
neuro-fuzzy) systems are distinguished: the best known as well as mostly applied 
Mamdani approach [13], and later developed — the logical approach [67], [6], [30]. 
Neuro-fuzzy systems based on both approaches are included into the general 
multi-layer architecture of the so-called fuzzy inference neural network, which is 
illustrated in Fig. 1. 

The difference between the Mamdani and logical approaches concerns the 
inference and aggregation layers shown in Fig. 1. The antecedent and defuzzifi- 
cation layers are the same in both cases. The form of this neuro-fuzzy connectio- 
nist architecture is explained in detail in [41], [30]. Now, let us briefly present 
the most important information about this network. 

At first, we should notice that the name fuzzy inference neural network 
suggests that this means a neural network architecture that performs a fuzzy 
inference. As a matter of fact, this connectionist network is similar to the 
multi-layer perceptron, which is the most popular artificial neural network (see 
e.g. [73], [30]). However, the elements (nodes) of the fuzzy inference neural net- 
work differ from neurons of the multi-layer perceptron. Only two elements in the 




Knowledge Acquisition and Processing 



65 



mA*) nAy") ^'{f) 




antecedent inference aggregation defuzzification 

layer layer layer layer 



Fig. 1 . General architecture of fuzzy inference neural network 



defuzzification layer, in Fig. 1, in fact, are linear neurons, almost the same as 
those applied in neural networks. Other elements realize different functions, e.g. 
the third node in the last layer performs the division operation. 

It should be emphasized that the network portrayed in Fig. 1 is not a fuzzy 
neural network (see [37], [30]), because the input/output signals at each layer 
and the connection weights are not fuzzy. On the other hand, there are connec- 
tionist neuro-fuzzy systems based on the generic fuzzy perceptron [18], which is a 
fuzzy neural network. Examples of such systems are the NEFCON, NEFCLASS, 
and NEFPROX, presented in [19]. An equivalence between these neuro-fuzzy 
systems and a special case of the fuzzy inference neural network illustrated in 
Fig. 1, i.e. the RBF-like system, can be shown [31]. Thus, the more general name 
neuro-fuzzy inference system may be used instead of the fuzzy inference neural 
network. However the word network informs about the connectionist form of the 
system. Therefore, connectionist neuro-fuzzy systems are often called neuro-fuzzy 
networks. 

The first layer of the network portrayed in Fig. 1 is the same in the RBF-like 
system, when the nodes realize the Gaussian radial basis functions [30]. Func- 
tional equivalence between RBF neural networks and fuzzy inference systems is 
shown in [11]. The nodes of the antecedent layer, in Fig. 1, perform the mem- 
bership functions of the fuzzy sets in the antecedent part of the fuzzy IF-THEN 
rules, which are formulated as follows 



R {k) : IF x is A k THEN y is B k 



(1) 




66 



D. Rutkowska 



where x = [xi,... ,x n ] T € X C R n , and y GY C R, are linguistic variables 
corresponding to the input and output of the system, A k = A\ x • • • x A k 
and B k are fuzzy sets characterized by the membership functions y Ak (x) and 
y, B k ( y ), respectively, for k = 1, . . . ,7V. 

The network illustrated in Fig. 1 reflects the rules (1), which constitute the 
rule base (knowledge base) of the neuro-fuzzy system. The second layer of this 
network (inference layer) contains elements performing the operation which is 

most important from the inference point of view. According to the compositional 



rule of inference, introduced by Zadeh [69], fuzzy set B , inferred by the fuzzy 
relation A k — ► B k , which corresponds to the IF-THEN rule , is a composition 
of the input fuzzy set A! and the relation, i.e. A' o (A fe — > B k ) . If the input fuzzy 

set A! is the singleton, which means that the singleton fuzzifier is applied, then 



the membership function of the fuzzy set B equals to the membership function 
of the fuzzy set which represents the fuzzy relation 



(v)=Ha>->b* ( x ,1/) ( 2 ) 

The elements of the inference layer, in Fig. 1, realize the membership func- 
tions given by Equation (2), for k = 1, . . . ,7V. The singleton fuzzifier is charac- 
terized by the following membership function 



Ma' ( x ) 



1 if x =x 

0 if xy7x 



( 3 ) 



where the input vector, x= [ah, . . . , x n ) T , is a crisp point in X = X\ x • • • x X n . 
More details can be found in [30]. 

The difference between the systems based on the Mamdani and logical 
approach is visible in the inference layer, as well as the aggregation layer, 
of the network. With regard to the inference, different membership functions 
Ma*->b* ( x j v ) are employed. In the Mamdani approach, the fuzzy relation 
A k -a B k is represented by the membership functions defined by use of the 
minimum or product operation, i.e. 



Ma*^b* ( x , V) = min {y Ak (x) , y Bk (y)} = y Ak (x) A y Bk (y) (4) 



or 



ma*-s-b* ( x >m) = ma* ( x ) a ( y ) 



( 5 ) 



It should be emphasized that the fuzzy relation R ^ characterized by the 
membership functions given by Equations (4) and (5) is not a fuzzy implication, 
although it represents a rule of type IF-THEN. Therefore, the relations of this 
kind are called engineering implications [14], 

The logical approach to fuzzy inference employs genuine implications instead 
of the engineering implications. Thus, the following membership functions, char- 
acterizing the fuzzy relation R^ k \ may be applied 




Knowledge Acquisition and Processing 



67 



Ha^b* ( x , V ) = max (1 - ji Ak (x) , y Bk (y)) (6) 

if the Kleene-Dienes implication is used, or 

( x , V) = min (1, 1 - y, Ak (x) + y Bk (y)) (7) 

if the Lukasiewicz implication is chosen. Other genuine implications are consid- 
ered in in [30]. 

Different special cases of the connectionist architecture portrayed in Fig. 1 
are obtained depending on the fuzzy implication employed. Some examples can 
also be found in [38], [39], [40], [22], [42], The name constructive and destructive , 
for the Mamdani and logical approach, respectively, have been introduced in the 
literature [67], but not often used. The implication-based fuzzy systems have 
been studied in [8] , [3] . 

The Mamdani and logical approaches differ also with regard to the aggrega- 
tion layer (Fig. 1). For the Mamdani approach, the elements of this layer realize 
the ( S-norm ) operation, but for the logical approach — the T-norm ; see [30] 
for details. The minimum and product are examples of the T-norm , while the 
maximum — of the T-conorm. The T-norm or T-conorm, respectively, are used 
in order to aggregate the rules, and get an overall output fuzzy set B' which is 

fc 

expressed as the union of the fuzzy sets B , for k = 1, . . . , N, when the Mam- 

fc 

dani approach is employed, and the intersection of the fuzzy sets B , when the 
logical approach is applied. 

The defuzzification layer, in Fig. 1, is composed of the three elements, men- 
tioned earlier, and this layer performs according to the method used in order to 
obtain a crisp (not fuzzy) output value, y. In this case, the following defuzzifi- 
cation method is employed 



-_ sti y k tb' (t) 

Ef=i ( t ) 



(8) 



where \x B , (y) is a membership function of the output fuzzy set B ' , and y k , for 
k = 1, . . . ,N, are discrete points in Y, which satisfy the condition 



(' V k ) = max {y, B k (y)} 
y 



(9) 



Equation (8) expresses a discrete version of the center of area defuzzification 
method [7], [30]. It is easy to notice that the defuzzification layer, in Fig. 1, 
exactly corresponds to the formula (8). 

The connectionist neuro-fuzzy network illustrated in Fig. 1 is described as 
follows by the mathematical expression 



Eti t ^ 

EfcLi A 



where 




D. Rutkowska 



A k 



N 



s ^ A o^ B o (x,r) 

N 

T^jV-hsj (x,y k ) 



for the Mamdani approach 
for the logical approach 



( 11 ) 



where S and T denote the T-conorm and T-norm, respectively, extended through 
associativity to more than two arguments. For details about T-norm and T- 
conorm functions, see [30], as well as [12]. 

Let us notice that, for the Mamdani approach, the membership functions 
* n Equation (11), for j = 1,... , TV, are defined according to formu- 
las (4) or (5), respectively, by use of the minimum or product operation, which 
are examples of the T-norm. Other T-norm functions may also be used, e.g. 
the bounded T-norm [30], [44]. Thus, for the Mamdani approach, from Equa- 
tion (11), we have 



N 

A l =.ST{^(x), /lBi (f*)} (12) 

For the logical approach, the membership functions l-i A i^ni , i n Equa- 
tion (11), for j = 1, . . . ,7V, are defined according to Equations (6), (7) or other 
formulas that express genuine implications. Let us denote the genuine implica- 
tion by the letter I . Thus, we can rewrite Equation (11), for the logical approach, 
as follows 



Afc = I {yi Aj (x) , y Bj ( y k ) } (13) 

The membership functions ji A k (x), for k = 1 ,N, are defined by use 
of the T-norm, for both Mamdani and logical approaches. The minimum or 
product operation is usually chosen as the T-norm, which in this case performs 
the Cartesian product 

A k = A k x • • • x A k (14) 

where A k , for i = 1, ... ,n, and k = 1, . . . , N, are antecedent fuzzy sets in the 
rules of the following form 

: IF x-, is A\ AND x 2 is A k AND . . . AND x n is A k n THEN y is B k 

(15) 



which is equivalent to the rule base (1). 

Thus, the membership functions y, A k (x), for k = 1, . . . ,N, are expressed as 
follows 

Ta* ( x ) = .T |/i A ^ (^i)} (16) 

where the extended T-norm is the minimum or product. 

The membership functions fi A k(x), for k = 1 ,... ,N, defined by Equa- 
tion (16), are realized by elements of the first layer of the network illustrated 




Knowledge Acquisition and Processing 



69 



in Fig. 1. For the crisp input vector x= [ah,. . . ,x n ] T , the output values of this 
layer, (i A k (x), for k = 1, . . . ,N, represent the degree of rule activation (firing 
strength) 



n 

Tk = k-A k (*) = T H A k (Xi) 
i= 1 1 



(17) 



From formulas (2), (10), and (11), we conclude that the output values of the 
second (inference) layer of the network portrayed in Fig. 1 equal to 

( V k ) = (x,y fc ) = R J (x,y fe ) (18) 

for j = 1, . . . , N, where 

/ fc\ = j T {n Ai (x) ( y k )} for the Mamdani approach , . 

' ' X, V ' ( I {fi Aj (x) , y Bj ( y k ) } for the logical approach 



Comparing formulas (8) and (10), we see that output values of the third 
(aggregation) layer of the network presented in Fig. 1 are 

^ k = k’B' ( y k ) ( 20 ) 

for k = 1 ,. . . , N, and from Equations (12), (13), and (19), we have 



{ N 

S R? (x, for the Mamdani approach 

° N 

T W (x, y k ) for the logical approach 

j = i 



(21) 



which can be called the aggregated value. 

The connection weights of the first linear neuron in the last (defuzzification) 
layer of the network shown in Fig. 1 equal to y , for k = 1, . . . , N. Thus, this 
layer carries out the crisp output value, y , according to Equation (10). 

As mentioned earlier, the multi-layer architecture of the network illustrated 
in Fig. 1 resembles the multi-layer perceptron neural network. Therefore it is 
possible to train the neuro-fuzzy connectionist network, in the similar way as 
neural networks are trained. As a result of the learnig process, parameters of the 
membership functions y, A k ( Xi ), and y B k (y), for i = 1, . . . , n, and k = 1, . . . ,N, 
such as centers and widths, for example y k which is the center of y B k (y), are 
optimally adjusted. For details, see e.g. [30]. 



4 Flexible Neuro-Fuzzy Systems 

In Section 3, the Mamdani and logical approaches to fuzzy inference are con- 
sidered with regard to the general architecture of the connectionist neuro-fuzzy 
systems. Despite the same general network, shown in Fig. 1, both systems (based 
on the Mamdani or logical approach) are treated separately, as special cases of 
this connectionist architecture. This means that at first we have to choose either 




70 



D. Rutkowska 



the architecture of the network based on the Mamdani approach or the network 
for the logical approach. If the network is chosen it can be trained (like neu- 
ral networks) in order to find optimal values of parameters of the membership 
functions which characterize fuzzy sets in the rule base. In this way, the system 
gathers knowledge represented in the form of the fuzzy IF-THEN rules. 

The so-called flexible neuro-fuzzy systems were recently introduced and de- 
veloped [47], [4], [5], [48], [49], [50], [51], [52]. The systems of this kind are 
represented by the same general architecture like that illustrated in Fig. 1, but 
the idea of flexibility, which allows to switch smoothly from the Mamdani type 
of inference to the logical approach, is incorporated into the network. The sys- 
tem automatically, during a learning process, decides whether it should be more 
Mamdani or logical type. Thus, the system is able to gather more knowledge, 
not only about the rules, but also concerning the type of inference. 

The combination of both types of inference, in the way described above, is 
possible by means of the so-called compromise operator, defined as follows [52]: 
A function 



N v : [0, 1] [0, 1] (22) 

given by 

N v (a) = (1 — v) N (a) + izN ( N (a)) = (1 — v) N (a) + va (23) 

is called a compromise operator, where v € [0, 1] and N (a) = N 0 (a) = 1 — a. 
The following theorem is formulated and proven in the literature, cited above. 



Theorem 1 . Let T and S be dual tringular norms. The function H mapping 



H: [0,1]"— ► [0,1] 



(24) 



and defined by 

H (a; v) = N v ( ^ [n v ( 0i )}) = ( .T { (o<)}) (25) 

varies between the T-norm and T- conorm as v increases from 0 to 1. 

Observe that the //-function. T-norm and T-conorm are related to each other 
in the following way 

( T {a} for v = 0 

H (a; v) = l 0.5 for v = 0.5 (26) 

y S {a} for v = 1 



n 

where a = (a \, . . . , a n ), and T {a} = T {ai, . . . , a n }. 

i = 1 

It is easy to notice that, for 0 < v < 0.5, the //-function resembles a T-norm 
and for 0.5 < v < 1 the //-function resembles a T-conorm. 




Knowledge Acquisition and Processing 



71 



We apply Theorem 1 in order to illustrate (for n = 2) how to switch between 
the product T -norm 

T{a 1 ,a 2 } = ir(ai,a 2 ;0) = oi,o 2 (27) 

and the corresponding T -conorm 

S {ai, a 2 } = H (ai, a 2 ; 1) = ai + 02 — aia 2 (28) 

Following Theorem 1, the /7-function generated by Equation (27) or (28) is 
given by 

H ( 01 , a 2 ; v) = N\_ v ^N]__ v (ai) Ni_ u (a 2 )^ (29) 

= N v (l - (l - N v (or)) (l - N v (o 2 ))) 

and varies from (27) to (28) as v increases from 0 to 1. 

Now, let us present another theorem formulated and proven in the cited 
literature. 

Theorem 2. Let T and S be dual tringular norms. Then 

I flex (a, b\v) = H (Nx_ v (a) , 6; uj (30) 

switches between an “engineering implication ” 

I eng (a, 6) = I f i ex (a, b\ 0) = T {a, b} (31) 

and an S -implication 

I fuzzy (a, b) = If iex (a, b- 1) = S {1 - a, b} (32) 

The following example of the //-implication generated by the product T- 
norm illustrates this theorem. 

Let 



I eng (a, b) = H (a, 6; 0) = T {a, b} = ab (33) 

and 

I fuzzy (a, b) = H (jV 0 (a) ,b-,lj = S {N (a) ,6} = 1 - a + ab (34) 

then (30) varies between (33) and (34) as v increases from 0 to 1. 

The flexible neuro-fuzzy system is described as follows 

r fe (x) = H (p, A k (5ii) , ..., p A k (: x n ) ; o) (35) 



R k (x,y J ) = H (Ni- u (T k (-x)),fJ, B k {y J ) 



(36) 




72 



D. Rutkowska 



A fc (x, y k ) = H ( R 1 (x, y k ) , ^ (x, y fc ) ; 1 - v) (37) 



V = 



Efcli ^ A fc ( X ^ A 0 
T,k=i x k (x,y fc ) 



(38) 



for fc, j = 1, . . . , IV. 

Let us notice that from Equation (26), tj, expressed by formulas (17) and (35) 
are the same. In addition, it is easy to show that the implication, I, in Equa- 
tion (19) can be expressed as follows 

/ {fi Aj (x) ,y BJ ( y k )} = S{N (y Aj (5c),y Bj (y k ))} 

= S{N(T k (x),y Bj (y k ))} (39) 



where N is the negation operation; see Equations (6), (7), and compare with 
the appropriate iS'-noriri. Thus, Equation (36) equals to Equation (19) if v = 0 
and v = 1, respectively, for the Mamdani and logical approaches. We can also 
conclude that if v = 0 then Equation (37) is equal to (21) for the logical approach, 
as well as for the Mamdani approach if v = 1. 

Thus, the architecture of the system described by Equation (38) is the same 
as that illustrated in Fig. 1, but this neuro-fuzzy system is flexible, which means 
that it varies between the Mamdani type of inference (v = 0) and the logical 
type of inference ( v = 1), as v increases from 0 to 1. The parameter v is called 
the compromise parameter, and can be determined by a learning process. 



5 Neuro-Fuzzy Relational Systems 

In this section, the so-called relational neuro-fuzzy systems are presented. They 
differ from the rule-based systems, described in Sections 3 and 4. Fuzzy relational 
systems store associations between input and output fuzzy sets in a discrete 
fuzzy relation [29]. The systems of this kind may be applied to control [2] or 
classification problems [56], like the systems considered in the previous sections. 
However, relational systems are more convenient from the viewpoint of adjusting 
parameters, which is realized by changing elements of the relation. Neuro-fuzzy 
relational systems are developed in [57], [58], [59], [60]. 

Like in the previous sections, assume that x = [x\, . . . , x n ) T eXc R", and 
y € Y C R, which means n-dimensional input, x, and scalar output, y, that is 
the MISO fuzzy system. Let A and B denote collections of linguistic terms A k 
and B m , respectively 



A = {A 1 , A 2 , ...,A K } 



(40) 



characterized by membership functions y A k (x), for k = 1, K, and 




Knowledge Acquisition and Processing 



73 



B = {B\B 2 ,...,B m } (41) 

characterized by membership functions Hgm (y ) , for m = 1, M. 

Sets (40) and (41) are related to each other with a certain degree by a K x M 
relation matrix R 



R = 



r n r 12 • • • r\ M 

f21 f22 ' ' ' f2M 
Tkm 

TK\ TK2 * * ‘ r K M 



(42) 



where r^m € [0, 1], and k = 1, K, m = 1, M. 

The relation R , defined by Equation (42), represents a mapping A — >■ B. 
Let x= \x\, . . . ,x n ] T is a crisp input vector. Given a vector A of K member- 
ship values y> A k (x), for a crisp observed input value x, we can obtain vector B 
of M crisp membership values fx m , using the fuzzy relational composition 

B = IoR (43) 



implemented element-wise by a generalized form of the sup — min composition, 
that is the S — T composition 

Mm = Sf = i [T (n Ak (x) ,r km )] 

The crisp output of the relational system is determined by the weighted mean 

_ = (x),r fcm )]} 

V E"iSf = 1 [T(^(x),rJ] 

where y m is the centre of gravity (centroid) of the fuzzy set B m . 

The system described by Equation (44) can be represented in the form of the 
connectionist neuro-fuzzy network, illustrated in Fig. 1. The first (antecedent) 
layer is the same, but it contains K nodes realizing membership functions y A k. 
The last (defuzzification) layer is also the same, but the first linear neuron has 
M inputs, and the connection weights equal to y m . The third (aggregation) layer 
is includes K nodes performing the S operation ( T-conorm ) or the soft OWA 
S-norm [59]. The second (inference) layer is composed of T-norm elements and 
fkm inputs to these nodes. 

In [59], [60] a new relational system with fuzzy antecedent certainty factors 
is considered. In this case, the relational matrix contains fuzzy sets Ck m defined 
on the unitary interval 



C ii C 12 
C21 C22 



■ ■ ■ C\M 

■ ■ ■ C2M 



Ckm 



Ck 1 C'K2 



Ckm 



R = 



(45) 




74 



D. Rutkowska 



These fuzzy sets represent linguistic values, which can express an uncertainty 
concerning antecedent terms. In SISO or MISO systems, with multidimensional 
antecedent fuzzy sets, an expert may define rules which are similar to the fol- 
lowing exemplary ones 

R 1 : IF x is exactly A 1 THEN y is B 1 (46) 

R 2 : IF x is more or less A 1 THEN y is B 2 
R 3 : IF x is roughly A 1 THEN y is B 3 

The system description by a mathematical formula, similar to Equation (44) , 
and the connectionist architecture of the system, are presented in [60]. 

Knowledge acquisition in neuro-fuzzy relational systems is performed by the 
learning process that includes a clustering algorithm, fine-tuning by the back- 
propagation method, and computing the fuzzy relation based on the training 
data using relational equations [58] . 



6 Probabilistic Neural Networks 



Another kind of connectionist networks are probabilistic neural networks, which 
are equivalent to the inference neural networks. Probabilistic neural networks are 
studied in [61], [28], [16], and also in [45], [46], [53], [24], where the equivalence, 
mentioned above, is shown. 

Based on te sample sequence {(X, Y ) , (X 1 , F 1 ) , ..., (X. N , Y N ) } of i.i.d. ran- 
dom variables, in order to estimate the regression function 

R (x) = E [Y | X = x] (47) 



the following estimator is proposed in the cited literature 



R n (x) = 




(48) 



Applying the following kernel function 

G(x) = ( 27 r) - ®" e - slNI 2 (49) 

to Equation (48), we obtain 



Rn (x) 



N n 

e Y k n exp 

k — 1 i — 1 

N n 

E n exp 

k=li=l 



1 

2 




(50) 




Knowledge Acquisition and Processing 



75 



It is easy to notice that the form of the Equation (50) is the same as the 
following description of the Mamdani type neuro-fuzzy network 



N n 

E V k IT exp 

k— 1 i=l 


- 


/ Xi-X J' 

V aN 


)1 


N n 

E II exp 

k—li—1 


-( 


Xi—x’l \ 

°N ) 


2" 



(51) 



The network described by Equation (51) is a special case of the connectionist 
neuro-fuzzy network portrayed in Fig. 1, where the elements of the first layer 
realize the Gaussian membership functions 



H A k (x{) = exp 




(52) 



This neuro-fuzzy network is proposed in [66], and developed in [30]. This 
network is illustrated in Fig. 2 and equivalent to that shown in Fig. 3 



Ma* ( x ) 



n 



n ex p 

2=1 




(53) 



where x\ and cr* are parameters (centers and widths, respectively) of the Gaus- 
sian membership functions; i = 1, . . . , n, and k = 1, . . . , N. 

The parameters of the membership functions are adjusted during the learn- 
ing process. In this way, the system gathers knowledge about the shapes of the 
fuzzy sets in the antecedent part of the rules. The most popular learning method 
employed in the neuro-fuzzy system of this kind is the gradient method (based 
on the steepest descent optimization procedure) which is similar to the back- 
propagation algorithm [73]. Apart from this method, a genetic (evolutionary) 
algorithm [15] can be applied. For details, see [30], as well as e.g. [20], [21]. 

The architecture of the neuro-fuzzy connectionist system shown in Fig. 3 is 
the same as the normalized RBF neural network, introduced in [17]; see e.g. [30]. 



7 Other Connectionist Neuro-Fuzzy Systems 

Based on the systems presented in the previous sections, many different neuro- 
fuzzy networks can be used. Various fuzzy inference neural networks may be 
created as special cases of the general architecture portrayed in Fig. 1. Apart 
from the systems described in Sections 4 and 5, more complex — hierarchical 
connectionist neuro-fuzzy systems can be constructed [25], [26], [27], as well as 
the systems that incorporate membership functions of type 2 fuzzy sets [54], 
[62], [63], [64], and the hybrid rough-neuro- fuzzy systems [23]. 

When a classification task is considered, the connectionist architecture may 
be even simpler. In this case, it is not necessary to apply the defuzzification layer, 
so with regard to the network shown in Fig. 3 only the first layer is sufficient, 




76 



D. Rutkowska 




Fig. 2. Basic architecture of the connectionist neuro-fuzzy system 




Fig. 3. Basic architecture of fuzzy inference neural network based on Mamdani ap- 
proach 



and for the equivalent network portrayed in Fig. 2 we need the first two layers 
of this network. Such a simple classification network is portrayed in Fig. 4. The 
input values represent the attribute values which characterize the object to be 
classified. The output values correspond to the classes to which the object can 





Knowledge Acquisition and Processing 



77 



belong. The nodes that realize membership functions can perform other functions 
than Gaussian, e.g. triangular, trapezoidal, etc., as well as type 2 membership 
functions, and input values do not need to be only numerical (crisp), in a more 
general case, when the network is considered as a fuzzy network. 




Fig. 4. Simple classification neuro- fuzzy network 



The network shown in Fig. 4 is very simple, but if it is necessary, more 
complex architectures may be employed, e.g. hierarchical or multi-segment con- 
nectionist systems that use this network as components. 



8 Perception-Based Classification Systems 

Perception-based intelligent systems that imitate the way of reasoning performed 
by humans may be created in the form of the neuro-fuzzy network, portrayed 
in Fig. 4, with applications to classification problems. Examples are presented 
in [32], [33]. The systems analyze data and generate fuzzy IF-THEN rules, em- 
ploying the fuzzy granulation [70]. The granules obtained are labeled by words 
that express perceptions concerning the attributes characterizing the objects to 
be classified. The reasoning based on these rules refers to the computing with 
words and perceptions [72], [34]. 

Fuzzy sets of type 2 can be used in order to represent an uncertainty with 
regard to the membership functions applied in such systems [32]. 

The main advantage of the multi-stage classification algorithm [36], which 
employs the perception-based approach, is the elimination of misclassifications. 
This is especially important in medical applications [33]. If it is not possible to 
classify every input data without misclassifications, the multi-stage perception- 
based algorithm can be combined with other classification methods, in the multi- 
expert system [35]. 



78 



D. Rutkowska 



9 Conclusions 

Neuro-fuzzy systems are soft computing methods utilizing artificial neural net- 
works and fuzzy systems. Various connectionist architectures of neuro-fuzzy sys- 
tems can be constructed. The knowledge acquisition concerns fuzzy IF-THEN 
rules, and is performed by a learning process. The systems realize an inference 
(fuzzy reasoning) based on these rules. Different applications of the systems can 
be distinguished, such as classification, control, function approximation, predic- 
tion. The systems can be viewed as intelligent systems, and considered with 
regard to artificial intelligence and cognitive sciences. 

References 

1. Bezdek, J.C.: What is computational intelligence? In: Zurada J.M., Marks II R.J., 
Robinson C.J. (Eds.): Computational Intelligence: Imitating Life. IEEE Press. New 
York (1994) 1-12 

2. Branco, P.J.C., Dente, J.A.: A Fuzzy Relational Identification Algorithm and its 
Application to Predict the Behaviour of a Motor Drive System. Fuzzy Sets and 
Systems. 109 (2000) 343-354 

3. Cordon, O., Herrera, F., Peregrin, A.: T-norms vs. Implication Functions as Im- 
plication Operators in Fuzzy Control. Proc. 6tli International Fuzzy Systems As- 
sociation World Congress (IFSA’95). Sao Paulo. Brazil (1995) 501-504 

4. Cpalka, K., Rutkowski, L.: Soft Neuro-Fuzzy Systems. Proc. Fifth Conference Neu- 
ral Networks and Soft Computing. Zakopane. Poland (2000) 296-301 

5. Cpalka, K., Rutkowski, L.: Compromise Neuro-Fuzzy System. Proc. Fourth Inter- 
national Conference on Parallel Processing and Applied Mathematics. Cz§stochowa 
Poland (2001) 33-40 

6. Czogala, E., Leski, J.: Fuzzy and Neuro-Fuzzy Intelligent Systems. Physica-Verlag. 
A Springer- Verlag Company. Heidelberg New York (2000) 

7. Driankov, D., Hellendoorn, H., Reinfrank, M.: An Introduction to Fuzzy Control. 
Springer- Verlag. Berlin, Heidelberg (1993) 

8. Dubois, D., Prade, H.: Fuzzy Sets in Approximate Reasoning. Part I: Inference 
with possibility distribution. Fuzzy Sets and Systems 40 (1991) 143-202 

9. Giorratano, J., Riley, G.: Expert Systems: Principles and Programming. PWS Pub- 
lishing Company Boston MA (1998) 

10. Jackson, P.: Introduction to Expert Systems. Addison Wesley (1999) 

11. Jang, J.-S.R., Sun, C.-T.: Fuctional Equivalence between Radial Basis Function 
Networks and Fuzzy Inference Systems. IEEE Trans. Neural Networks 4 1 (1993) 
156-159 

12. Klcment, E.P., Mesiar, R., Pap, E.: Triangular Norms. Kluwer Academic Publishers 
Dordrecht Boston London (2000) 

13. Mamdani, E.H., Assilian, S.: An Experiment in Linguistic Synthesis with a Fuzzy 
Logic Controller. International Journal of Man-Machine Studies 7 (1975) 1-13 

14. Mendel, J.M.: Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New 
Directions. Prentice Hall Upper Saddle River N.J. (2001) 

15. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. 
Springer- Verlag Berlin. Heidelberg. New York (1992) 

16. Montana, D.J.: A Weighted Probabilistic Neural Network. Advances in Neural 
Information Processing Systems 4 (1992) 1110-1117 




Knowledge Acquisition and Processing 



79 



17. Moody, J., Darken, C.: Learning with Localized Receptive Fields. In: Touretzky, D., 
Hinton, G., Sejnowski, T. (eds.): 1988 Connectionist Models Summer School, Pitts- 
burgh Morgan Kaufmann Publishers San Mateo CA USA (1989) 133-143 

18. Nauck, D.: A Fuzzy Perceptron as a Generic Model for Neuro-Fuzzy Approaches. 
Proc. Conference: Fuzzy-Systeme’94, Munich (1994) 

19. Nauck, D., Klawonn, F., Kruse, R.: Foundations of Neuro-Fuzzy Systems. John 
Wiley & Sons (1997) 

20. Nomura, H., Hayashi, I,, Wakami, N.: A Self- Tuning Method of Fuzzy Control 
by Descent Method. Proc. 4th International Fuzzy Systems Association World 
Congress, IFSA’91 Brussels Belgium (1991) 155-158 

21. Nomura, H., Hayashi, I., Wakami, N.: A Self- Tuning Method of Fuzzy Reasoning 
by Genetic Algorithm. Proceedings of the 1992 International Fuzzy Systems and 
Intelligent Control Conference, Louisville KY USA (1992) 236-245 

22. Nowicki, R., Rutkowska, D.: Neuro-Fuzzy Architectures Based on Yager Implica- 
tion. Proc. 5th Conference on Neural Networks and Soft Computing, Zakopane 
Poland (2000) 353-360 

23. Nowicki, R., Rutkowski, L.: Rough-Neuro-Fuzzy System for Classification. Proc. 
9th International Conference on Neural Information Processing, ICONIP’02 Orchid 
Country Club Singapore (2002) 

24. Nowicki, R., Rutkowski, L.: Soft Techniques for Bayesian Classification. In: 
Rutkowski, L., Kacprzyk, J. (eds.): Neural Networks and Soft Computing, Physica- 
Verlag, A Springer- Verlag Company Heidelberg New York (2003) 537-544 

25. Nowicki, R., Scherer, R., Rutkowski, L.: A Neuro-Fuzzy System Based on the 
Hierarchical Prioritized Structure. Proc. 10th Zittau Fuzzy Colloquium, Zittau 
Germany (2002) 192-198 

26. Nowicki, R., Scherer, R., Rutkowski, L.: A Method for Learning of Hierarchical 
Fuzzy Systems. Proc. 2nd Euro-International Symposium on Computational Intel- 
ligence 76 Kosice Slovakia (2002) 124-129 

27. Nowicki, R., Scherer, R. , Rutkowski, L.: A Hierarchical Neuro-Fuzzy System Based 
on s-Implication. IJCNN-2003 Proc. International Joint Conference on Neural Net- 
works, Portland Oregon (2003) 321-325 

28. Patterson, D.W.: Artificial Neural Networks: Therory and Applications. Prentice 
Hall Singapore (1996) 

29. Pedrycz, W.: Fuzzy Control and Fuzzy Systems. Research Studies Press London 
(1989) 

30. Rutkowska, D.: Neuro-Fuzzy Architectures and Hybrid Learning. Physica- Verlag, 
A Springer- Verlag Company Heidelberg New York (2002) 

31. Rutkowska, D.: Type 2 Fuzzy Neural Networks: an Interpretation Based on Fuzzy 
Inference Neural Networks with Fuzzy Parameters. Proc. 2002 IEEE Congress on 
Computational Intelligence, FUZZ-IEEE’02, Honolulu Hawaii (2002) 1180-1185 

32. Rutkowska, D.: A Perception-Based Classification System. Proc. CIMCA 2003 
Conference, Vienna Austria (2003) 52-61 

33. Rutkowska, D.: Perception-Based Systems for Medical Diagnosis. Proc. Third 
EUSFLAT 2003, Zittau Germany (2003) 741-746 

34. Rutkowska, D.: Perception-Based Reasoning: Evaluation Systems. International 
Journal Task Quarterly, 7 1 (2003) 131 145 

35. Rutkowska, D.: Multi- Expert Systems. Proc. 5th International Conference: Parallel 
Processing and Applied Mathematics. Cz§stochowa Poland (2003) 

36. Rutkowska, D.: Perception-Based Expert Systems. Soft Computing Journal (2003) 
submitted 




80 



D. Rutkowska 



37. Rutkowska, D., Hayashi, Y.: Neuro-Fuzzy Systems Approaches. Journal of Ad- 
vanced Computational Intelligence. 3 3 (1999) 177-185 

38. Rutkowska, D., Nowicki, R.: Fuzzy Inference Neural Networks Based on Destruc- 
tive and Constructive Approaches and Their Application to Classification. Proc. 
4th Conference on Neural Networks and Their Applications, Zakopane Poland 
(1999) 294-301 

39. Rutkowska, D., Nowicki, R.: Constructive and Destructive Approach to Neuro- 
Fuzzy Systems. Proc. EUROFUSE-SIC’99, Budapest Hungary (1999) 100-105 

40. Rutkowska, D., Nowicki, R.: Neuro-Fuzzy Architectures Based on Fodor Implica- 
tion. Proc. 8th Zittau Fuzzy Colloquium, Zittau Germany (2000) 230-237 

41. Rutkowska, D., Nowicki, R.: Implication-Based Neuro-Fuzzy Architectures. Inter- 
national Journal of Applied Mathematics and Computer Science 10 4 (2000) 675- 
701 

42. Rutkowska, D., Nowicki, R.: Neuro-Fuzzy Systems: Destructive Approach. In: Cho- 
jcan, J., Leski, J. (eds.): Fuzzy Sets and Their Applications, Silesian University 
Press Gliwice Poland (2001)285-292 

43. Rutkowska, D., Kacprzyk, J., Zadeh, L.A. (eds.): Computing with Words and Per- 
ceptions. International Journal of Applied Mathematics and Computer Science 12 
3 (2002) 

44. Rutkowska, D., Rutkowski, L., Nowicki, R.: Neuro-Fuzzy System with Inference 
Based on Bounded Product. In: Mastorakis, N. (ed.): Advances in Neural Networks 
and Applications, World Scientific and Engineering Society Press (2001) 104 109 

45. Rutkowski, L.: Identification of MISO Nonlinear Regressions in the Presence of a 
Wide Class of Disturbances. IEEE Trans. Information Theory IT-37 (1991) 214- 
216 

46. Rutkowski, L.: Multiple Fourier Series Procedures for Extraction of Nonlinear Re- 
gressions from Noisy Data. IEEE Trans. Signal Processing 41 10 (1993) 3062-3065 

47. Rutkowski, L., Cpalka, K.: Flexible Structures of Neuro-Fuzzy Systems. Quo Vadis 
Computational Intelligence, Studies in Fuzziness and Soft Computing, Springer 54 
(2000) 479-484 

48. Rutkowski, L., Cpalka, K.: A General Approach to Neuro-Fuzzy Systems. Proc. 
10th IEEE International Conference on Fuzzy Systems, Melbourne Australia 
(2001) 

49. Rutkowski, L., Cpalka, K.: A Neuro-Fuzzy Controller with a Compromise Fuzzy 
Reasoning. Control and Cybernetics 31 2 (2002) 297-308 

50. Rutkowski, L., Cpalka, K.: Compromise Approach to Neuro-Fuzzy Systems. Proc. 
2nd Euro-International Symposium on Computational Intelligence 76 Kosice Slo- 
vakia (2002) 85-90 

51. Rutkowski, L., Cpalka, K.: Flexible Weighted Neuro-Fuzzy Systems. Proc. 9th In- 
ternational Conference on Neural Information Processing, ICONIP’02, Orchid 
Country Club Singapore (2002) 

52. Rutkowski, L., Cpalka, K.: Flexible Neuro-Fuzzy Systems. IEEE Trans. Neural 
Networks 14 (2003) 554-574 

53. Rutkowski, L., Galkowski, T.: On Pattern Classification and System Identification 
by Probabilistic Neural Networks. Applied Mathematics and Computer Science 4 
3 (1994) 413-422 

54. Rutkowski, L., Starczewski, J.: From Type-1 to Type-2 Fuzzy Interference Systems 
- Part 1, Part 2. Proc. Fifth Conference on Neural Networks and Soft Computing, 
Zakopane Poland (2000) 46-51, 52-65 

55. Sage, A.P. (ed.): Coincise Encyclopedia of Information Processing in Systems and 
Organization. Pergamon Press New York (1990) 




Knowledge Acquisition and Processing 



81 



56. Setness, M., Babuska, R.: Fuzzy Relational Classifier Trained by Fuzzy Clustering. 
IEEE Trans. Systems, Man and Cybernetics - Part B: Cybernetics 29 5 (1999) 
619-625 

57. Scherer, R., Rutkowski, L.: A Neuro-Fuzzy Relational System. Proc. Fourth Inter- 
national Conference on Parallel Processing and Applied Mathematics, Cz§stochowa 
Poland (2001) 131 135 

58. Scherer, R., Rutkowski, L.: Relational Equations Initializing Neuro-Fuzzy System. 
Proc. 10th Zittau Fuzzy Colloquium, Zittau Germany (2002) 212-217 

59. Scherer, R., Rutkowski, L.: Neuro-Fuzzy Relational Systems. Proc. 9tli Interna- 
tional Conference on Neural Information Processing, ICONIP’02, Orchid Country 
Club Singapore (2002) 

60. Scherer, R., Rutkowski, L.: A Fuzzy Relational System with Linguistic Antecedent 
Certainty Factors. In: Rutkowski, L., Kacprzyk, .1. (eds.): Neural Networks and Soft 
Computing, Physica-Verlag, A Springer- Verlag Company Heidelberg New York. 
(2003) 563-569 

61. Specht, D.: Probabilistic Neural Networks. Neural Networks 3 1 (1990) 109-118 

62. Starczewski, J., Rutkowski, L.: Connectionist Structures of Type 2 Fuzzy Inference 
Systems. Lecture Notes in Computer Science 2328 (2001) 634-642 

63. Starczewski, .1., Rutkowski, L.: Neuro-Fuzzy Inference Systems of Type 2. Proc. 
9th International Conference on Neural Information Processing, ICONIP’02, Or- 
chid Country Club Singapore (2002) 

64. Starczewski, J., Rutkowski, L.: Interval Type 2 Neuro-Fuzzy Systems Based on 
Interval Consequents. In: Rutkowski, L., Kacprzyk, J. (eds.): Neural Networks 
and Soft Computing, Physica-Verlag, A Springer- Verlag Company Heidelberg New 
York (2003) 570-577 

65. Takagi, H.: Fusion Technology of Neural Networks and Fuzzy Systems: A Chroni- 
cled Progression from the Laboratory to Our Daily Lives. International Journal of 
Applied Mathematics and Computer Science 10 4 (2000) 647-673 

66. Wang, L.-X.: Adaptive Fuzzy Systems and Control. PTR Prentice Hall Englewood 
Cliffs New Jersey (1994) 

67. Yager, R.R., Filev, D.P.: Essentials of Fuzzy Modeling and Control. John Wiley & 
Sons (1994) 

68. Zadeh, L.A.: Towards a Theory of Fuzzy Systems. In: Kalman, R.E., 
DeClaris, N. (eds.): Aspects of Network and System Theory, Holt. Rinehart and 
Winston New York (1971) 

69. Zadeh, L.A.: Outline of a New Approach to the Analysis of Complex Systems 
and Decision Processes. IEEE Trans. Systems, Man, and Cybernetics SMC-3 1 
(1973) 28-44 

70. Zadeh, L.A.: Fuzzy Sets and Information Granularity. In: Gupta, M., Ragade, R., 
Yager, R. (eds.): Advances in Fuzzy Set Theory and Applications, North Holland 
Amsterdam (1979) 3-18 

71. Zadeh, L.A.: Fuzzy Logic, Neural Networks and Soft Computing. Communications 
of the ACM 37 3 (1994) 77-84 

72. Zadeh, L.A.: From Computing with Numbers to Computing with Words - from 
Manipulation of Measurements to Manipulation of Perceptions. IEEE Trans. Cir- 
cuits and Systems - I: Fundamental Theory and Applications 45 1 (1999) 105-119 

73. Zurada, J.M.: Introduction to Artificial Neural Systems. West Publishing Company 
(1992) 




Algorithms for Scalable Storage Servers* 



Peter Sanders 



Max Planck Institut fur Informatik 
Saarbriicken, Germany 

sanders@mpi-sb . mpg . de 



Abstract. We survey a set of algorithmic techniques that make it pos- 
sible to build a high performance storage server from a network of cheap 
components. Such a storage server offers a very simple programming 
model. To the clients it looks like a single very large disk that can handle 
many requests in parallel with minimal interference between the requests. 
The algorithms use randomization, redundant storage, and sophisticated 
scheduling strategies to achieve this goal. The focus is on algorithmic 
techniques and open questions. The paper summarizes several previous 
papers and presents a new strategy for handling heterogeneous disks. 



1 Introduction 

It is said that our society is an information society, i.e., efficiently storing and 
retrieving a vast amount of information has become a driving force of our econo- 
my and society. Most of this information is stored on hard disks — many hard 
disks actually. Some applications (e.g., geographical information systems, satel- 
lite image libraries, climate simulation, particle physics) already measure their 
data bases in petabytes (10 15 bytes). Currently, the largest of these applications 
use huge tape libraries, but hard disks can now store the same data for a similar 
price offering much higher performance [13]. To store such amounts of data one 
would need about 10 000 disks. Systems with thousands of disks have already 
been build and there are projects for “mid-range” systems that would scale to 
12 000 disks. 

This paper discusses algorithmic challenges resulting from the goal to operate 
large collections of hard disks in an efficient, reliable, flexible, and user-friendly 
way. Some of these questions are already relevant if you put four disks in your 
PC. But things get really interesting (also from a theoretical point of view) 
if we talk about up to 1024 disks in a traditional monolithic storage server 
(e.g. http : //www.hds . com/products/systems/9900v/), or even heterogeneous 
networks of workstations, servers, parallel computers, and many many disks. In 
this paper all of this is viewed as a storage server. 

We concentrate on a simple model that already addresses the requirement 
of user-friendliness to a large extent. Essentially, the entire storage server is 

* Partially supported by the Future and Emerging Technologies programme of the EU 
under contract number IST-1999-14186 (ALCOM-FT). 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 82-101, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Algorithms for Scalable Storage Servers 



83 




Fig. 1 . A storage server appears to the outside world like a huge disk accepting many 
parallel request from inside and outside the system 



presented to the operating system of the computers that run the applications 
( clients ) as a single very large disk (see Fig. 1): There is a single logical address 
space, i.e., an array of bytes A[0..iV — l]. 1 N is essentially the total cumulative 
usable capacity of all disks. The clients can submit requests for bytes A[a..b\ to 
the storage server. There will be some delay (some milliseconds as in a physical 
disk) and then data is delivered at a high rate (currently up to 50 MByte/s 
from a single disk). Otherwise, the clients can behave completely naively: In 
particular, requests should be handled in parallel with minimal additional delays. 
Large requests (many megabytes) should be handled by many disks in parallel. If 
any single component of the system fails, no data should be lost and the effect on 
performance should be minimal. If the system is upgraded with additional disks, 
usually larger than those previously present, the logical address space should be 
extended accordingly and future requests should profit from the the increased 
cumulative data rate of the system. 

The storage server can be implemented as part of the operating software of 
a monolithic system or as a distributed program with processes on the client 
computers and possibly on dedicated server machines or network attached stor- 
age , i.e., disks that are directly connected to the computer. All these components 



1 We use a..b as a shorthand for the range a, . . . ,b and A\a..b] stands for the subarray 
<A[°],... ,A[b]}. 






84 



P. Sanders 



communicate over a network. Higher level functionality such as file systems or 
data base systems can be implemented on top of our virtual address space much 
in the same way they are today build on top of a physical disk. 

We will develop the algorithmic aspects of a storage server in a step by step 
manner giving intuitive arguments why they work but citing more specialized 
papers for most proofs. The basic idea is to split the logical address space A into 
fixed size logical blocks that are mapped to random disks. Sect. 3 explains that 
this is already enough to guarantee low latencies for write requests with high 
probability using a small write buffer. To get basic fault tolerance we need to 
store the data redundantly. Sect. 4 shows that two independently placed copies 
of each block suffice to also guarantee low read latency for arbitrary sets of block 
read requests. Sect. 5 demonstrates how we can support accesses to variable size 
pieces of blocks with similar performance guarantees. We are also not stuck with 
storing every block twice. Sect. 6 explains how more sophisticated encoding gives 
us control over different tradeoffs with respect to efficiency, waste of space, and 
fault tolerance. Up to that point we make the assumption that the clients submit 
batches of requests in a synchronized fashion — this allows us to give rigorous 
performance guarantees. In Sect. 7 we lift this assumption and allow requests 
to enter the storage server independently of each other. Although a theoretical 
treatment gets more difficult, the basic approach of random redundant allocation 
still works and we get simple algorithms that can be implemented in a distributed 
fashion. The algorithms described in Sect. 8 use the redundant storage when 
a disk or other components of the system fails. It turns out that the clients see 
almost nothing of the fault not even in terms of performance. Furthermore, after 
a very short time, the system is again in a safe state where further component 
failures can be tolerated. Sect. 3 assumes that a write operation can return as 
soon as there is enough space to keep it in RAM memory. In Sect. 9 we explain 
what can be done if this is not acceptable because a loss of power could erase the 
RAM. For simplicity of exposition we assume most of the time that the system 
consists of D identical disks but Sect. 10 generalizes to the case of different 
capacity disks that can be added incrementally. 

2 Related Work 

A widely used approach to storage servers is RAID [27] (Redundant Arrays 
of Independent Disks). Different RAID levels (0-5) offer different combinations 
of two basic techniques: In mirroring (RAID Level 1), each disk has a mirror 
disk storing the same data. This is similar to the random duplicate allocation 
(RDA) introduced in Sect. 4 only that the latter stores each block independently 
on different disks. We will see that this leads to better performance in several 
respects. Striping (RAID Level 0) [31] is a simple and elegant way to exploit 
disk parallelism: Logical blocks are split into D equal sized pieces and each piece 
is stored on a different disk. This way, accesses to logical blocks are always 
balanced over all disks. This works well for small D , but for large D, we would 
get a huge logical block size that is problematic for applications that need fine 




Algorithms for Scalable Storage Servers 



85 



grained access. Fault tolerance can be achieved at low cost by splitting logical 
blocks into D — 1 pieces and storing the bit-wise xor of these pieces in a parity 
block (RAID Levels 3, 5). 

Larger storage servers are usually operated in such a way that files or par- 
titions are manually assigned to small subsets of disks that are operated like 
a RAID array. The point of view taken in this paper is that this management 
effort is often avoidable without a performance penalty. In applications where 
the space and bandwidth requirement are highly dynamic, automatic methods 
may even outperform the most careful manual assignment of data to disks. 

Load balancing by random placement of data is a well known technique 
(e.g., [7], [23]). Combining random placement and redundancy has first been 
considered in parallel computing for PRAM emulation [18] and online load ba- 
lancing [6]. For scheduling disk accesses, these techniques have been used for 
multimedia applications [39], [40], [19], [24], [8], [35]. The methods described 
here are mostly a summary of four papers [34], [33], [32], [16]. Sect. 10 describes 
new results. 

There are many algorithms explicitly designed to work efficiently with coarse- 
grained block- wise access. Most use the model by Vitter and Shriver [42] that 
allows identical parallel disks and a fixed block size. Vitter [41] has written a good 
overview article. More overviews and several introductory articles are collected 
in an LNCS Tutorial [22]. 

3 Write Buffering 

3.1 Greedy Writing 

Consider the implementation of an opera- 
tion write(a, B,i) that writes a client ar- 
ray a[0..B — 1] to the logical address space 
A[i..i+B— 1]. The main observation exploited 
in this section is that write can in principle be 
implemented to return almost immediately: 

Just copy the data to a buffer space. 2 The 
matching read operation read(B,i) returns 
the cached data without a disk access. 

An obvious limitation of this buffering 
strategy is that we will eventually run out 
of buffer space without a good strategy for 
actually outputting the data to the disks. We 
postpone the question how data is mapped to 
the disks until Sect. 3.2 because the following 
greedy writing algorithm works for any given 
assignment of data to the disks. We maintain 

2 We can even do without a copy if we “steal” a from the client and only release it 
when the data is finally copied or output. 




Fig. 2. Optimal Writing 




86 



P. Sanders 



a queue of output requests for each disk. Whenever a disk falls idle, one request 
from this queue is submitted to the disk. Fig. 2 illustrates this strategy. In some 
sense, greedy writing is optimal: 

Theorem 1 ([16]). Consider the I/O model of Vitter and Shriver [42] (fixed 
block size, fixed output cost). Assume some sequence of block writes is to be 
performed in that logical order and at most in blocks can be buffered by the 
storage server. Then greedy writing minimizes the number of I/O steps needed 
if the disk queues are managed in a FIFO (first in first out) manner. 

Proof. (Outline) An induction proof shows that greedy writing is optimal among 
all output strategies that maintain queues in a FIFO manner. Another simple 
lemma shows that any schedule can be transformed into a FIFO schedule without 
increasing the I/O time or the memory requirement. ■ 

Things get more complicated for more realistic I/O models that take into 
account that I/O times depend on the time to move the disk head between the 
position of two blocks. 

Open Problem: 1 Can you find (approximately) optimal writing algorithms 
for the case that I/O costs depend on the position of blocks on the disks? Even 
for fixed block size 3 and cost estimates only dependent on seek time little is known 
if the buffer size is limited / 

3.2 Random Allocation 

Theorem 1 is a bit hollow because performance can still be very bad if all blocks 
we need to write have to go to the same disk. We would like to have an allocation 
strategy that avoids such cases. But this seems impossible — for any given 
mapping of the address space to the disks, there will be sets of requests that all 
go to the same disk. Randomization offers a way out of this dilemma. We allocate 
logical blocks to random disks for some fixed (large) block size B. (Sect. 10 
discusses details how this mapping should actually be implemented.) Random 
mapping makes it very unlikely that a particular set of blocks requested by the 
clients reside on the same disk. More generally, we get the following performance 
guarantee for arbitrary sequences of write requests: 

Theorem 2 ([34], [16]). Consider the I/O model of Vitter and Shriver [42] (D 
disks, fixed block size, fixed output cost). Assume some sequence of n randomly 
mapped different blocks are to be written and at most m blocks can be buffered by 

3 Variable block sizes open another can of worms. One immediately gets NP-hard 
problems. But allowing a small amount of additional memory removes most compli- 
cations in that respect. 

4 For infinite buffer size, the problem is easy if we look at seek times only (just sort by 
track) or rotational delays only [38]. For both types of delays together we have an 
NP-hard variant of the traveling salesman problem with polynomial time solutions 
in some special cases [5]. 




Algorithms for Scalable Storage Servers 



87 



the storage server. Then greedy writing accepts the last block after an expected 
number of (1 + 0{D/m)) 2 ft output steps. After the last block has been accepted, 
the longest queue has length 0{fft\ogD') . 

It can also be shown that longer execution times only happen with very small 
probability. 

Proof. (Outline) The optimal greedy writing algorithm dominates a “throttled” 
algorithm where in each I/O step (1 — D/m)D blocks are written. The effect 
of the throttled algorithm on a single disk can be analyzed using methods from 
queuing theory if the buffer size is unlimited. The average queue length turns 
out to be bounded by m/2D and hence the expected sum of all queue lengths 
is bounded by m/2. More complicated arguments establish that large deviations 
from this sum are unlikely and hence the influence of situations where the buffer 
overflows is negligible. ■ 

Open Problem: 2 The number of steps needed to write n blocks and flush all 
buffers should be a decreasing function of the buffer size m. Prove such a mono- 
tonic bound that is at least as good as Theorem 1 both for large and small n. 



3.3 Distributed Implementation 

We now explain how the abstract algorithms described above can be imple- 
mented in a real system. We will assume that one or several disks are connected 
to a computer where one ore several processors share a memory. Several com- 
puters are connected by a communication network to form the storage server. 
Disks directly attached to the network are viewed as small computers with a 
single disk attached to them. The client applications either run on the same 
system or send requests to the storage server via the network. Let us consider 
the possible routes of data in the first case: When a write operation for an array 
s is called, the data is located in the client memory on computer S. Array s 
contains data from one or several randomly mapped blocks of data. Let us focus 
on the data destined for one of the target disks t that is attached to a server 
machine T. The ideal situation would be that disk t is currently idle and the 
data is shipped to the network interface card of T which directly forwards it to 
disk t, bypassing the processor and main memory of T. Since this is difficult to 
do in a portable way and since t may be busy anyway, the more likely alternative 
is that S contacts T and asks it to reserve space to put s into the queue of t. 
If this is impossible, the execution of the write operation blocks until space is 
available or S tries to buffer s locally. Eventually, the data is transferred to the 
main memory of T. When the request gets its turn, it is transmitted to the disk 
t which means that it ends up in the local cache of this disk and is then written. 

This scenario deviates in several points from the theoretical analysis: 

— The nice performance bounds only hold when all disks share the same global 
pool of buffers whereas the implementation makes use only of the local me- 
mories of the computer hosting the target disk t. It can be shown that this 




P. Sanders 



makes little difference if the local memories are large compared to log-D 
blocks. Otherwise, one could consider shipping the data to third parties 
when neither S nor T have enough local memory. But this only makes sense 
if the the network is very fast. 

— Theorem 2 assumes that all written blocks are different. Overwriting a block 
that is still buffered will save us an output. But it can happen that overwrit- 
ing blocks that have recently been output can cause additional delays [34]. 
Again, this can be shown to be unproblematic if the local memory is large 
compared to log D blocks. Otherwise dynamically remapping data can help. 

— The logical blocks used for random mapping should be fairly large (cur- 
rently megabytes) in order to allow accesses close to the peak performance 
of the disks. This can cause a problem for applications that less rely on high 
throughput for consecutive accesses than on low latency for many parallel 
fine grained accesses. In that case many consecutive small blocks can lie on 
the same disk which then becomes a bottleneck. In this case it might make 
sense to use a separate address space with small logical blocks for fine grained 
accesses. 



4 Random Duplicate Allocation 



In the previous section we have seen that random allocation and some buffering 
allow us to write with high throughput and low latency. The same strategy 
seems promising for reading data. Randomization ensures that data is spread 
uniformly over the disks and buffer space can be used for prefetching data so 
that it is available when needed. Indeed, there is a far reaching analogy between 
reading and writing [16] : When we run a writing algorithm “backwards” we get 
a reading algorithm. In particular, Theorem 1 transfers. However this reversal 
of time implies that we need to know the future accesses in advance and we pay 
the 0(m \og(D)/D) steps for the maximum queue length up front. 




Fig. 3. The concept of (R)andom (D)uplicate (A)llocation 



Algorithms for Scalable Storage Servers 



89 



Therefore, we now bring in an additional ingredient: Each logical block is 
stored redundantly. Figure 3 illustrates this concept. For now we concentrate 
on the simple case that each block is allocated to two randomly chosen disks. 
Sect. 6 discusses generalizations. Redundancy gives us flexibility in choosing from 
where to read the data and this allows us to reduce read latencies dramatically. 
By choosing two different disks for the copies we get the additional benefit that 
no data is lost when a disk fails. 

We begin with two algorithms for scheduling a batch of n requested blocks of 
fixed size that have been analyzed very accurately: The shortest queue algorithm 
allocates the requests in a greedy fashion. Consider a block e with copies on disks 
d and d' and let £(d) and £{d') denote the number of blocks already planned for 
disks d and d! respectively. Then the shortest queue algorithm plans e for the 
disk with smaller load. Ties are broken arbitrarily. It can be shown that this 
algorithm produces a schedule that needs 

Tl 

k = — + log In D + 0(1) 

expected I/O steps [9]. This is very good for large n but has an additive term 
that grows with the system size. 



requests disks 




Fig. 4. A flow network showing how five requests are allocated to three disks. The flow 
defined by the solid lines proves that the requests can be retrieved in two I/O steps 



We will see that optimal schedules do not have this problem — we can 
do better by not committing our choices before we have seen all the requests. 
Optimal schedules can be found in polynomial time [12]: Suppose we want to find 
out whether k steps suffice to retrieve all requests. Consider a flow network [2] 
that consists of four layers: A source node in the first layer is connected to each 
of n request nodes. Each request node is connected to two out of D disk nodes 
one edge for each disk that holds a copy of the requested block. The disk 
nodes are connected to a sink node t. The edges between disk nodes and t, have 
capacity k. All other nodes have capacity 1. Now it is easy to see that a flow 
saturating all edges leaving the source node exists if and only if k steps are 
sufficient. A schedule can be derived from an integral maximum flow by reading 
request r from disk d if and only if the edge (r, d) carries flow. Figure 4 gives 
an example. The correct value for k can be found by trial and error. First try 





90 



P. Sanders 



k = | n/D~\ then k = \n/D~\ +1, ... , until a solution is found. Korst [19] gives 
a different flow formulation that uses only D nodes and demonstrates that the 
problem can be solved in time 0(n + D 3 ). If n = O(D) it can be shown that 
the problem can be solved in time O(nlogn) with high probability [34]. 

Theorem 3. [34] Consider a batch of n randomly and duplicately allocated 
blocks to be read from D disks. The optimal algorithm needs at most 




+ 1 steps with probability at least 1 — 0(1/ D)^ n ^ D ^ +1 



Proof. (Outline) Using a graph theoretical model of the problem, it can be shown 
that the requests can be retrieved in k steps if and only if there is no subset A of 
disks such that more than |Z\|fc requested blocks have both their copies allocated 
to a disk in A [37]. Hence, it suffices to show that it is unlikely that such an 
overloaded subset exists. This is a tractable problem mostly because the number 
of blocks allocated to A is binomially distributed. ■ 



0) 

CO 



o 



CD 

O) 

2 

CD 

> 

CC 




n/D 



Fig. 5. The number of I/O steps (minus the lower bound \n/ D~\) needed by optimal 
scheduling when scheduling n G 256.. 4096 blocks on D = 256 disks 



Figure 5 shows the performance of the optimal scheduling algorithm. We 
only give the data for D = 256 because this curve is almost independent of the 
number of disks. We see that the performance is even better than predicted by 
Theorem 3: We can expect to get schedules that need only |~n/D] steps except if 
n is a multiple of D or slightly below. For example, when n = 3.8419, we almost 
always get a schedule with 4 steps, i.e. , we are within 4 % of the best possible 
performance. We also see that for large n we can even get perfect balance when 
n is a multiple of D. 





Algorithms for Scalable Storage Servers 



91 



Open Problem: 3 Is there a threshold constant c such that for n > C-D log D, 
optimal scheduling finds a schedule with \n/D] I/O steps even if n is a multiple 
of D? 



4.1 The Selfless Algorithm 

One problem with optimal scheduling is that we do not have good performance 
guarantees for scheduling large sets of blocks efficiently. Therefore it makes sense 
to look for fast algorithms that are close to optimal. Here we describe a linear 
time algorithm that produces very close to optimal solutions [11]. 

The selfless algorithm distinguishes between committed and uncommitted 
requests. Uncommitted requests still have a choice between two disks. Commit- 
ted requests have decided for one of the two choices. Initially, all requests are 
uncommitted. A disk d is called committed if there are no uncommitted requests 
that have d as a choice. Let the load 1(d) of disk d denote twice the number of 
requests that have committed to d plus the number of uncommitted requests 
that have d as an option. The selfless algorithm is based on two simple rules: 

1. If there is an uncommitted disk with at most \n/D ] remaining incident 
requests (committed or uncommitted), we commit all of them to this disk. 

2. Otherwise, we choose an uncommitted disk d with minimum load, choose an 
uncommitted request with d as an option, and commit it to d. 

This algorithm can be implemented to run in linear time using fairly standard 
data structures: Disks are viewed as nodes of a graph. Uncommitted requests are 
edges. Using an appropriate graph representation, edges can be removed in con- 
stant time (e.g., [21]). When a disk becomes a candidate for Rule 1, we remember 
it on a stack. The remaining nodes are kept in a priority queue ordered by their 
load. Insert, decrement-priority and delete-minimum can be implemented to run 
in amortized constant time using a slight variant of a bucket priority queue [14]. 

If we would plot the performance of the selfless algorithm in the same way as 
in Figure 5, it would be impossible to see a difference, i.e., with large probability, 
the selfless algorithm finds an optimal schedule. This empirical observation will 
be complemented by an analytical treatment in an upcoming paper [11] using 
differential equation methods that have previously been used for the mathema- 
tically closely related problem of cores of random graphs [28] . 

5 Variable Size Requests 

We now drop the assumption that we are dealing with fixed size jobs that take 
unit time to retrieve. Instead, let < 1 denote the time needed to retrieve 
request i. This generalization can be used to model several aspects of storage 
servers: 

— We might want to retrieve just parts of a logical block 




92 



P. Sanders 



— Disks are divided into zones [30] of different data density and correspondingly 
different data rate blocks on the outer zones are faster to retrieve than 
blocks on the inner zones. We assume here that both copies of a block are 
stored on the same zone. 



The bad news is that it is strongly NP-hard to assign requests to disks so 
that the I/O time is minimized [1]. The good news is that optimal scheduling 
is still possible if we allow request to be split, i.e., we are allowed to combine a 
request from pieces read from both copies. We make the simplifying assumption 
here that a request of size I = t\ + I 2 stored on disks d\ and d 2 can be retrieved 
by spending time i\ on disk d\ and time I 2 on disk d 2 . This is approximately 
true if requests are large. 

Even the performance guarantees for random duplicate allocation transfer. 
We report a simplified version of a result from [32] that has the same form as 
Theorem 3 for unit size requests: 

Theorem 4. Consider a set R of request with total size n = '// JreR t r of ran- 
domly and duplicately allocated requests to he read from D disks. The optimal 
algorithm computes a schedule with I/O time at most 



' n ' 
D 



+ 1 with probability at least 1 — 0(1/D) ^ n / D ^ +1 



The proofs and the algorithms are completely analogous. The only difference is 
that the maximum flows will now not be integral and hence require splitting of 
requests. Splitting can also have a positive effect for unit size requests since it 
eliminates threshold effects such es the spikes visible in Fig. 5. A more detailed 
analysis indicates that the retrieval time becomes a monotonic function of the 
number of requests [32] . 

In a sense, Theorem 4 is much more important than Theorem 3. For unit 
size requests, we can relatively easily establish the expected performance of an 
algorithm by simulating all interesting cases a sufficient number of times. Here, 
this is not possible since Theorem 4 holds for a vast space of possible inputs 
(uncountably big and still exponential if we discretize the piece sizes). 



6 Reducing Redundancy 

Instead of simply replicating logical blocks, we can more generally encode a 
logical block which has r times the size of a physical block into w physical blocks 
such that reading any r out of the w blocks suffices to reconstruct the logical 
block. Perhaps the most important case is w = r + 1. Using parity -encoding, 
r of the blocks are simply pieces of the logical block and the last block is the 
exclusive-or of the other blocks. A missing block can then be reconstructed by 
taking the exclusive-or of the blocks read. Parity encoding is the easiest way 
to reduce redundancy compared to RDA while maintaining some flexibility in 
scheduling. Its main drawback is that the physical blocks being read are a factor 
r smaller than the logical blocks so that high bandwidth can only be expected 




Algorithms for Scalable Storage Servers 



93 



if the logical blocks are fairly large. The special case r = D — 1,W = D yields 
the coding scheme used for RAID levels 3 and 5. 

Choosing w > r + 1 can be useful if more than one disk failure is to be 
tolerated (see Sect. 8) or if we additionally want to reduce output latencies (see 
Sect. 9). A disadvantage of codes with w > r+1 is that they are computationally 
more expensive than parity-encoding [20], [15], [29], [10], [4], [8]. 

Most of the scheduling algorithms for RDA we have discussed are easy to gen- 
eralize for more general coding schemes. Only optimal scheduling needs some ad- 
ditional consideration. A formulation that is a generalization of bipartite match- 
ing [33] yields a polynomial time algorithm however. 




Fig. 6. Encoding of a logical block of size 12 into 4 physical blocks and one parity 
block of size 3 such that aligned logical requests of size 4s, s G {1, 2, 3} can be fulfilled 
by retrieving any 4 out of 5 physical blocks of size s 



A small trick also allows us to use general coding schemes for arbitrary request 
sizes: As before, data is allocated for large logical blocks whereas actual requests 
may retrieve parts of these blocks. But the coding is done in very small pieces 
(say sectors of size 512) and the encoded pieces are stored in the physical blocks 
in an interleaved fashion. Figure 6 gives an example. 

7 Asynchronous Access 

In Sect. 3.3 we have already explained how writing can be implemented in an 
asynchronous, distributed way by providing one thread for each disk. We now ex- 
plain how this can be generalized for read accesses in the presence of redundant 
allocation. Client requests for a block of data arrive individually in an asyn- 
chronous fashion. The clients want to have these requests answered quickly, i.e., 
they want small delays. The algorithms described in Sect. 4 can be generalized 
for this purpose [33]. For example, the shortest queue algorithm would commit 
the request to the disk that can serve it fastest. 

However, we loose most of the performance guarantees. For example, it is easy 
to develop an algorithm that minimizes the maximum delay among all known 
requests but it is not clear how to anticipate the impact of these decisions on 
requests arriving in the future. 

Open Problem: 4 Give theoretical hounds for the expected latency of any of 
the asynchronous scheduling algorithms discussed in [33] as a function of D 




94 



P. Sanders 



and e in the following model: A block access on any of the D disks takes unit 
time. A request for a block arrives every (1 + e)/D time units. For non-redundant 
random allocation it can be shown that the expected delay is <9(l/e). Experiments 
and heuristic considerations suggest that time 0(log(l/e)) is achievable using 
redundancy. 

Asynchrony also introduces a new algorithmic concept that we want to dis- 
cuss in more detail: Lazy decisions. The simplest lazy algorithm — lazy queuing 

queues a request readable on disks d and d! in queues for both d and d' . The 
decision which disk actually fetches the block is postponed until the last possible 
moment. When a disk d falls idle, the thread responsible for this disk inspects 
its queue and removes one request r queued there. Then it communicates with 
the thread responsible for the other copy of r to make sure that r is not fetched 
twice. Lazy queuing has the interesting property that it is equivalent to an “om- 
niscient” shortest queue algorithm, i.e. , it achieves the same performance even 
if it does not know how long it takes to retrieve a request. 

Theorem 5. Given an arbitrary request stream where a disk d needs t(d , r) 
time units to serve request r. Then the lazy queue algorithm produces the same 
schedule as a shortest queue algorithm which exactly computes disk loads by 
summing the t(d,r)-values of the scheduled requests. 

The only possible disadvantage of lazy algorithms compared to “eager” algo- 
rithms such as shortest queue is that a simple implementation can incur addi- 
tional communication delays at the performance critical moment when a disk 
is ready to retrieve the next request (asking another thread and waiting for 
a reply). This problem can be mitigated by trying to agree on a primary copy 
of a request r before the previous request finishes. The disk holding the primary 
copy can then immediately fetch r and in parallel it can send a confirmation to 
the thread with the other copy. 

Figure 7 shows that RDA significantly outperforms traditional output 
schemes. Even mirroring that has the same amount of redundancy produces 
much larger degrees when the storage server approaches its limits. Measurements 
not shown here indicate that the gap is much larger when we are interesting in 
the largest delays that are encountered sufficiently often to be significant for real 
time applications such as video streaming. 

It can also be shown that fluctuations in the arrival rate of requests have little 
impact on performance if the the number of requests arriving over the time in- 
terval of an average delay is not too big. Furthermore, the scheduling algorithms 
can be adapted in such a way that applications that need high throughput even 
at the price of large delays can coexist with applications that rely on small 
delays [33]. 

8 Fault Tolerance 

When a disk fails, the peak system throughput decreases by a small factor of 
1/D. Furthermore, requests which have a copy on the faulty disks lose their 




Algorithms for Scalable Storage Servers 



95 




1/e 



Fig. 7. Average delays for 10' requests to D = 64 disks arriving at time intervals of 
(1 +e)/D. The mirror algorithm uses random allocation to a RAID-1 array. Lazy sharing 
is a refinement of lazy queue — an idle disk d leaves a request r to the alternative disk 
d' (r) if the queue of d' is shorter than the queue of d 



scheduling flexibility. Since only few requests are affected, load balancing still 
works well [32]. In addition, there are now logical blocks that have less redun- 
dancy than the others so that additional disk failures could now lead to a loss 
of data. To get out of this dangerous situation, the lost redundancy has to be 
reestablished. This can be achieved without exchanging any hardware by dis- 
persing these blocks over unused space of the other disks. This can be done 
very quickly because the data read and written for this purpose is uniformly 
distributed over all disks. If we are willing to invest a fraction of e of the peak 
performance of the system, the reconstruction can finish in a fraction of about 
1 /e(D — 1) of the time needed to read and write one disk. For example, in a large 
system with 10 000 disks with 100GByte each and a disk I/O rate of 50MByte/s 
we could in principle reconstruct a failed disk in as little as ten seconds investing 
4 % of our peak I/O rate. Section 10 will explain how a random mapping of the 
data is maintained in this situation. 

Failures of disk controllers or entire machines can be handled in a similar 
manner if the random duplicate allocation is modified in such a way that different 
copies are allocated to different pieces of hardware. The ultimate realization of 
this strategy divides the storage server into halves that are physically so far apart 
that even a fire or another catastrophe is unlikely to destroy both halves at the 
same time. The limiting factor here are the costs of a high speed interconnection 





96 



P. Sanders 



between the halves so that for such systems one may consider to have more than 
two copies of each block and to send data to the remote half only occasionally. 

A major challenge in practical fault tolerance is that it is very difficult to test 
the behavior of distributed software under faults. Since the data losses can be 
really expensive, storage servers might therefore be a prime candidate for formal 
verification: 

Open Problem: 5 Define a useful abstract model of a storage server and its 
software and prove that it operates correctly under disk failures, power loss, .... 

9 Reducing Write Latency 

Somewhat paradoxically, there are many applications where writes are much 
more frequent than reads, i.e., much of what is written is never read. The reason 
is that a lot of the data needed by a client can be cached in main memory (by 
the storage server or by the application). One could argue that one would not 
have to output this data at all but this neglects that many applications must be 
able to recover their old state after a power loss. 

There are several ways to handle this situation. One would be to make sure 
that write buffers are in memory with enough battery backup that they can be 
flushed to disk at a power loss. The next step on the safety ladder makes sure 
that the data is buffered in two processors with independent power supply before 
a write operation returns. But some applications will still prefer to wait until 
the data is actually output. In this situation, the strategy from Section 3 leads 
to fairly long waiting time under high load. 

In this situation, the generalized coding schemes outlined in Sect. 6 can be 
used. If a logical block can be reconstructed from any r out of w pieces, we 
can return from a write operation after r' pieces are output (r < r' < w) 
and we get a flexible tradeoff between write latency and fault tolerance. For 
example, for r = 1 and w = 3 we could return already when two copies have 
been written. Which copies are written can be decided using any of the scheduling 
algorithms discussed above, for example the lazy queuing algorithm from Sect. 7. 
The remaining copy is not written at all or only with reduced priority so that it 
cannot delay other time critical disk accesses. 

10 Inhomogeneous Dynamically Evolving Storage Servers 

A storage server that operates reliably 24h a day 365 days a year should allow us 
to add disks dynamically when the demands for capacity or bandwidth increase. 
Since technology is continuously advancing, we would like to add new disks with 
higher capacity and bandwidth than the existing disks. Even if we would be 
willing to settle for the old type, this becomes infeasible after a few years when 
the old type is no longer for sale. In Sect. 8 we have already said that we want 
to be able to remove failed disks from the system without replacing them by 
new disks. The main algorithmic challenge in such systems is to maintain our 




Algorithms for Scalable Storage Servers 



97 



concept of load balancing by randomly mapping logical blocks to disks. We first 
explain how a single random mapping from the virtual address space to the disks 
is obtained. 

Inhomogeneity can be accommodated by mapping a block not directly to 
the D inhomogeneous disks but first to D' volumes that accommodate N/D'B 
blocks each. The volumes are then mapped to the disks in such a way that the 
ratio r(d) = c(d)/v(d) between the capacity c(d) of a disk d and the number 
of volumes v(d) allocated to it is about the same everywhere. More precisely, 
when a volume is allocated, it is greedily moved to the disk that maximizes 
r(d). If D' /D D max c(d) / J2d c (d), we will achieve a good utilization of disk 

capacity. 5 

When a disk fails, the volumes previously allocated to it will be distributed 
over the remaining disks. This is safe as long as min^r^d) exceeds N/D' . When 
a new disk d! is added, volumes from the disks with smallest r(d) are moved to 
the new disk until r(d') would become minimal. In order to move or reconstruct 
volumes, only the data in the affected volumes needs to be touched whereas all 
the remaining volumes remain untouched. 

It remains to define a random mapping of blocks to volumes. We present 
a pragmatic solution that outperforms a true random mapping in certain aspects 
but where an accurate analysis of the scheduling algorithms remains an open 
question. We achieve a perfectly balanced allocation of blocks to volumes, by 
striping blocks over the volumes, i.e., blocks iD' ..{i + 1 )D' — 1 are mapped in 
such a way that each volume receives one block. To achieve randomness, block 
iD' + j, 0 < j < D, is mapped via a (pseudo) random permutation 7 q to volume 
7 Ti(j). Figure 8 summarizes the translation of logical addresses into block offsets, 
disk IDs, and positions on the disk. 

In order to find out which blocks need to be moved or reconstructed when 
a disk is added or replaced, we would like to have permutations that are easy to 
invert. Feistel permutations [25] are one way to achieve that: Assume for now 
that \[Ty is an integer and represent j as j = j a + jbVD’. Now consider the 
mapping 

7TU(C ia,jb)) = ( jb,ja + fi,l(jb ) m od V D ') 

where /,,! is some (pseudo)random function. If we iterate such mappings two to 
four times using pseudo-random functions f lt i , . . . ,/y 4 we get something “pretty 
random” . Indeed, such permutations can be shown to be random in some precise 
sense that is useful for cryptology [25]. A Feistel permutation is easy to invert. 

7 b“fc((a, b )) = {b- fi,k(a) mod \[Ty , a) 

We assume that the functions f t> k are represented in some compact way, e.g., 
using any kind of ordinary pseudo-random hash function h that maps triples 

5 If disk bandwidth is more of an issue than disk capacity, we can also balance accord- 
ing to the data rate a disk can support. But even without that, the lazy scheduling 
algorithms from Sect. 7 will automatically direct some traffic away from the over- 
loaded disks. 




98 



P. Sanders 




Fig. 8. How a logical address is mapped to a physical block. The numbers give an 
example with one Petabyte of address space, B = 2 20 , and D' = 2 18 that would 
currently require about 10 000 disks of 100 GByte each 



(■ i,j,k ) to values in 0 ..D' — 1. In order to find out to which disk a block is 
mapped, the only additional data structure we need is a lookup table of size D' . 
This data structure is easy to replicate to the local memory of all processors. For 
example, even in a large system with D = 10000, with disk capacities varying 
by a factor of four, D' = 2 18 would already achieve quite good load balance. 
To achieve fault tolerance, this lookup table and the parameters of the hash 
function h should be stored redundantly at a predefined place. But even if the 
table gets lost, it can be reconstructed as long as the capacity of the disks and 
the order in which they were added or removed is known — we only need to 
make sure that the algorithms for mapping volumes to disks are deterministic. 



Redundant Allocation 

In order to use the above scheme in the context of duplicate allocation, we 
partition the storage server into two partitions whose total storage capacity is 
about equal. The volumes are mapped to both partitions and we have two sets 
of random permutations — one for each partition. More generally, if we use a 
coding scheme that writes w physical blocks for each logical block, we need w 
partitions. To achieve good fault tolerance, components in different partitions 
should share as few common points of failure as possible (controllers, processors, 
power supplies, ... ) . Therefore, the disks will not be assigned to the partitions 
one by one but in coarse grained units like controllers or even entire machines. 

Although this partitioning problem is NP-lrard, there are good approximation 
algorithms [3]. In particular, since we are dealing with a small constant number 
of partitions, fully polynomial time approximation schemes can be developed 
using standard techniques [43]. 

Maintaining reasonably balanced partitions while components enter (new 
hardware) or leave (failures) the system in an online fashion is a more compli- 
cated problem. In general, we will have to move components but these changes 
in configuration should only affect a small number of components with total 
capacity proportional to added or removed capacity. At least it is easy to main- 
tain the invariant that the difference between the capacities of the smallest and 
largest partition is bounded by the maximum component capacity. 





Algorithms for Scalable Storage Servers 



99 



11 Discussion 

We have introduced some of the algorithmic backbone of scalable storage servers. 
We have neglected many important aspects because we believe that they are or- 
thogonal to the concepts introduced here, i.e., their implementation does not 
much affect the decisions for the aspects discussed here: We need an infra- 
structure that allows reliable high bandwidth communication between arbitrary 
processors in the network. Although random allocation helps by automatically 
avoiding hot spots, good routing strategies can be challenging in inhomogeneous 
dynamically changing networks. 

Caching can make actual disk accesses superfluous. This is a well understood 
topic for centralized memory [17], [26] but distributed caching faces interesting 
tradeoffs between communication overhead and cache hit rate. 

There are further important issues with a different flavor such as locking 
mechanisms to coordinate concurrent accesses, file systems, real time issues, . . . 

In addition, there are interesting aspects that are less well understood yet and 
pose interesting questions for future work. For example, we have treated all data 
equal. But in reality, some data is accessed more frequently than other data. 
Besides the short term measure of caching, this leads to the question of data 
migration (e.g. [36]). Important data should be spread evenly over the disks, it 
should be allocated to the fastest zones of the disks, and it could be stored with 
higher redundancy. The bulk of the data that is accessed rarely, could be stored 
on cheaper disks or even on disks that are powered down for saving energy. Such 
Massive Arrays of Idle Disks [13] are a candidate for replacing tape libraries 
and could scale to 10s of thousands of disks. 



Acknowledgements. In the last years I had many interesting discussions about 
algorithms for storage servers. Here is a partial list of people who have helped 
me to form a coherent picture of this complex issue between algorithmics, prob- 
ability, coding, operating systems, data bases, computer architecture, market 
mechanisms, patents, . . . : Joep Aerts, Eitan Baclrmat, Petra Berenbrink, An- 
dre Brinkmann, Sebastian Egner, Harald Hiils, David Irwin, Mahesh Kallahalla, 
Jan Korst, Jan Marien, Kurt Mehllrorn, Kay Salzwedel, Christian Sclreideler, 
Martin Skutella, Peter Varman, Jeff Vitter, Gerhard Weikum, Winfried Wilcke, 
Gerhard Woeginger. 



References 

1. Aerts, J., Korst, J., Verhaegh, W.: Complexity of Retrieval Problems. Technical 
Report NL-MS-20.899, Philips Research Laboratories (2000) 

2. Ahuja, R.K., Magnanti, R.L., Orlin, J.B.: Network Flows. Prentice Hall (1993) 

3. Alon, N., Azar, Y., Woeginger, G.J., Yadid, T.: Approximation Schemes for 

Scheduling. In: SODA (1997) 493-500 




100 



P. Sanders 



4. Alvarez, G.A., Burkhard, W.A., Cristian, F.: Tolerating Multiple Failures in RAID 
Architectures with Optimal Storage and Uniform Declustering. In: Proceedings of 
the 24th Annual International Symposium on Computer Architecture (ISCA-97), 
volume 25,2 of Computer Architecture News New York ACM Press, June 2-4 1997 
62-72 

5. Andrews, M., Bender, M.A., Zhang, L.: New Algorithms for the Disk Scheduling 
Problem. In: IEEE, editor, 37th Annual Symposium on Foundations of Computer 
Science, IEEE Computer Society Press (1996) 550-559 

6. Azar, Y., Broder, A.Z., Karlin, A.R., Upfal, E.: Balanced Allocations. In: 

26th ACM Symposium on the Theory of Computing (1994) 593-602 

7. Barve, R.D., Grove, E.F., Vitter, J.S.: Simple Randomized Mergesort on Parallel 
Disks. Parallel Computing, 23 4 (1997) 601-631 

8. Berenbrink, P., Brinkmann, A., Sc.heideler, C.: Design of the PRESTO Multime- 
dia Storage Network. In: International Workshop on Communication and Data 
Management in Large Networks, Paderborn Germany, October 5 1999, 2-12 

9. Berenbrink, P., Czumaj, A., Steger, A., Vocking, B.: Balanced Allocations: The 
Heavily Loaded Case. In: 32th Annual ACM Symposium on Theory of Computing, 
(2000) 745-754 

10. Blaum, M., Brady, J., Brack, J., Menon, J.: EVENODD: An Optimal Scheme for 
Tolerating Double Disk Failures in RAID Architectures. In: Proceedings of the 
21st Annual International Symposium on Computer Architecture (1994) 245-254 

11. Cain, J., Sanders, P., Wormald, N.: A Random Multigraph Process for Linear 
Time RDA Disk Scheduling. Manuscript in preparation (2003) 

12. Chen, L.T., Rotem, D.: Optimal Response Time Retrieval of Replicated Data 
(Extended abstract). In: 13th ACM Symposium on Principles of Database Systems, 
ACM Press 13 (1994) 36-44 

13. Colarelli, D., Grunwald, D.: Massive Arrays of Idle Disks for Storage Archives. In: 
SC’2002 Conference CD, IEEE/ACM SIGARCH, Baltimore, MD, November 2002 

14. Dial, R.B.: Algorithm 360: Shortest-Path Forest with Topological Ordering. Com- 
munications of the ACM 12 11 (1969) 632-633 

15. Gibson, G.A., Hellerstein, L., Karp, R.M., Katz, R.H., Patterson, D.A.: Coding 
Techniques for Handling Failures in Large Disk Arrays, CSD-88-477. Technical 
report, U. C. Berkley (1988) 

16. Hutchinson, D.A., Sanders, P., Vitter, J.S.: Duality between Prefetching and 

Queued Writing with Parallel Disks. In: 9th European Symposium on Algorithms 
(ESA), LNCS, Vol. 2161. Springer (2001) 62-73 

17. Irani, S.: Competetive Analysis of Paging. In: Online Algorithms — The State of 
the Art, LNCS, Vol. 1442. Springer (1998) 52-73 

18. Karp, R.M., Luby, M., Meyer auf der Heide, F.: Efficient PRAM Simulation on 
a Distributed Memory Machine. In: 24th ACM Symp. on Theory of Computing, 
May 1992 318-326 

19. Korst, J.: Random Duplicate Assignment: An Alternative to Striping in Video 
Servers. In: ACM Multimedia, Seattle (1997) 219-226 

20. MacWilliams, F.J., Sloane, N.J.A.: Theory of Error-Correcting Codes. North- 
Holland (1988) 

21. Mehlhorn, K., Naher, S.: The LEDA Platform of Combinatorial and Geometric 
Computing. Cambridge University Press (1999) 

22. Meyer, U., Sanders, P., Sibeyn, J. (eds): Algorithms for Memory Hierarchies, LNCS 
Tutorial, Vol. 2625. Springer (2003) 

23. Miller E.L., Katz, R.H.: RAMA: An Easy-to-Use, High-Performance Parallel File 
System. Parallel Computing 23 (1997) 419-446 




Algorithms for Scalable Storage Servers 101 



24. Muntz, R., Santos, J.R., Berson, S.: A Parallel Disk Storage System for Real-Time 
Multimedia Applications. International Journal of Intelligent Systems 13 (1998) 
1137-1174 

25. Naor, M., Reingold, O.: On the Construction of Pseudorandom Permutations: 
Luby-Rackoff Revisited. Journal of Cryptology: the Journal of the International 
Association for Cryptologic Research 12 1 (1999) 29-66 

26. O’Neil, O’Neil, Weikum: An Optimality Proof of the LRU-K Page Replacement 
Algorithm. JACM: Journal of the ACM 46 1999 

27. Patterson, D., Gibson, G., Katz, R.: A Case for Redundant Arrays of Inexpensive 
Disks (RAID). Proceedings of ACM SIGMOD’88 (1988) 109-116 

28. Pittel, B., Spencer, J., Wormald, N.: Sudden Emergence of a Giant fc-Core in 
Random Graph. J. Combinatorial Theory, Series B, 67 (1996) 111-151 

29. Rabin, M.O.: Efficient Dispersal of Information for Security, Load Balancing and 
Fault Tolerance. Journal of the ACM, 36 2 (1989) 335-348 

30. Ruemmler, C., Wilkes, J.: An Introduction to Disk Drive Modeling. IEEE Com- 
puter, 27 3 March 1994 17-28 

31. Salem, K., Garcia-Molina, H.: Disk Striping. Proceedings of Data Engineering’86 
(1986) 

32. Sanders, P.: Reconciling Simplicity and Realism in Parallel Disk Models. Parallel 
Computing, 28 5 (2002) 705-723. Short version in 12th SODA, (2001) 67-76 

33. Sanders, P.: Asynchronous Scheduling of Redundant Disk Arrays. IEEE Transac- 
tions on Computers 52 9 (2003) 1170-1184. Short version in 12tli ACM Symposium 
on Parallel Algorithms and Architectures (2000) 89-98 

34. Sanders, P., Egner, S., Korst, J.: Fast Concurrent Access to Parallel Disks. Algo- 
rithmica 35 1 (2003) 21-55. Short version in 11th SODA (2000) 849-858 

35. Santos, J.R., Muntz, R.R., Ribeiro-Neto, B.: Comparing Random Data Allocation 
and Data Striping in Multimedia Servers. In: ACM SIGMETRICS (2000) 44-55 

36. Scheuermann, P., Weikum, G., Zabback, P.: Data Partitioning and Load Balancing 
in Parallel Disk Systems. VLDB Journal: Very Large Data Bases, 7 1 (1998) 48-66 

37. Schoenmakers, L.A.M.: A New Algorithm for the Recognition of Series Parallel 
Graphs. Technical Report CS-R9504, CWI - Centrum voor Wiskunde en Infor- 
matica, January 31, 1995 

38. Stone, H.S., Fuller, S.F.: On the Near-Optimality of the Shortest- Access-Time- 
First Drum Scheduling Discipline. Communications of the ACM 16 6 (1973) 352- 
353. Also published in/as: Technical Note No. 12, DSL 

39. Tetzlaff, W., Flynn, R.: Block Allocation in Video Servers for Availability and 
Throughput. Proceedings Multimedia Computing and Networking (1996) 

40. Tewari, R., Mukherjee, R., Dias, D.M., Vin, H.M.: Design and Performance Trade- 
offs in Clustered Video Servers. Proceedings of the International Conference on 
Multimedia Computing and Systems (1996) 144-150 

41. Vitter, J.S.: External Memory Algorithms and Data Structures: Dealing with 
Massive Data. ACM Computing Surveys 33 2 (2001) 209-271 

42. Vitter, J.S., Shriver, E.A.M.: Algorithms for Parallel Memory, I: Two Level Mem- 
ories. Algorithmica 12 2/3 (1994) 110-147 

43. Woeginger, G.J.: When Does a Dynamic Programming Formulation Guarantee the 
Existence of a Fully Polynomial Time Approximation Scheme (fptas). INFORMS 
Journal on Computing 12 (2000) 57-75 




Fuzzy Unification and Argumentation for Well-Founded 

Semantics 



Ralf Schweimeier 1 and Michael Schroeder 1-2 



1 Department of Computing, City University 
Northampton Square, London EC1V OHB, UK 
{ralf ,msch}@soi . city . ac .uk 

2 Department of Computer Science. Technische Universitiit Dresden 
01062 Dresden, Germany 



Abstract. Argumentation as metaphor for logic programming semantics is 
a sound basis to define negotiating agents. If such agents operate in an open 
system, they have to be able to negotiate and argue efficiently in a goal-directed 
fashion and they have to deal with uncertain and vague knowledge. In this paper, 
we define an argumentation framework with fuzzy unification and reasoning for 
the well-founded semantics to handle uncertainty. In particular, we address three 
main problems: how to define a goal-directed top-down proof procedure for justi- 
fied arguments, which is important for agents which have to respond in real-time; 
how to provide expressive knowledge representation including default and explicit 
negation and uncertainty, which is among others part of agent communication lan- 
guages such as FIPA or KQML; how to deal with reasoning in open agent systems, 
where agents should be able to reason despite misunderstandings. 

To deal with these problems, we introduce a basic argumentation framework and 
extend it to cope with fuzzy reasoning and fuzzy unification. For the latter case, 
we develop a corresponding sound and complete top-down proof procedure. 



1 Introduction 

Argumentation has been widely studied as the basis for the semantics of logic pro- 
grams [1], [2], [3], [4]. Basically, the execution of a logic program can be described 
as a dialogue of a proponent defending a goal and an opponent attacking it. Recently, 
argumentation has been applied to describe and define negotiation of agents [5], [6], [7], 
[8], [9], [10], [1 1]. In contrast to negotiation by auctions e.g., argumentation is a natural 
mechanism to negotiate about multiple criteria and to establish joint beliefs. Initial work 
in this area [5], [6], [7], [8] gave a proof-of-concept for arguing agents. In this paper, we 
want to build on this work and go a step further and address problems, which arise when 
trying to move from a proof-of-concept to an efficient implementation. To this end, there 
are three main problems, which need to be addressed: 

- Expressive knowledge representation: At the centre of most agents is a knowledge 
system with an inference mechanism. This can range from a simple database to a 
fuzzy factbase [12], [13]. A factbase has tables to store positive and negative knowl- 
edge and as a consequence comprises two kinds of negation, explicit and default 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 102-121, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



103 



negation [12]. Such expressiveness is often needed. The widely used KQML [14] for 
example, distinguishes untell(A) from tell(~< A). To implement this KQML feature 
one has to represent positive and explicitly negative facts separately. Furthermore, 
one requires two types of negation: explicit negation to state that something is in 
the negative table and default negation to state that something is not in the positive 
table. As a result, agents compliant with this KQML-feature have to be capable to 
deal with a three-valued logic (true, false, unknown). 

Furthermore, the agents beliefs may be fuzzy. Such a concept of uncertainty is 
e.g. built into FIPA ACL [15] in the form of an uncertainty operator. Since agents 
operate in uncertain environments and encounter fuzzy concepts, they have to be 
able to represent such uncertainty and fuzziness and reason about it. This applies 
to argumentation in particular, where the attacking arguments, which the agents 
exchange, may be qualified. This poses a particular problem if combined with a rich 
knowledge representation for positive and negative knowledge, so that explicit and 
default negation need to be defined for fuzzy reasoning. 

- Mismatches in open systems: Many arguing agents will operate in open systems. 
This means that their knowledge and ontologies are defined by different people and 
will not necessarily match, leading to misunderstandings. It is doubtful [16], whether 
the general ontology problem of how to integrate different ontologies will be solved 
in the near future. Nonetheless one can aim to facilitate agent communication despite 
missing parameters and mismatches in the predicate and parameter names. This is 
especially a problem when agents interact across system boundaries or with humans. 

- Goal-directed, top-down proof procedures for justified arguments: Previous 
work on arguing agents [5, 6, 7, 8] defines justified arguments as a fixpoint of accept- 
able arguments. Such a definition is elegant and well-suited to define a declarative 
semantics, but it does not lend itself well for an efficient implementation needed 
for agents, which have to react in real-time. The reason is that the above fixpoint is 
computed bottom-up, which requires an agent to compute all justified arguments - 
a heavy burden when the negotiation is only about a single predicate. Agents, which 
are to perform in real-time, require an efficient proof-procedure. A goal-directed 
top-down proof procedure, which allows the agent to determine for an individual 
argument whether it is justified or not, satisfies this need. 

In this paper, we will address these three problems by extending established ap- 
proaches of unification and of well-founded semantics for extended logic programming 
(ELP) into two directions: 

- Fuzzy Reasoning: To tackle the problem of expressive knowledge representation 
for argumentation, we will develop a framework for fuzzy argumentation, which 
comprises two kinds of negation. We have to take important decisions on how to 
interpret fuzziness and negation. Since we place emphasis on extending previous 
work, our main goal is to conservatively extend an existing semantics, namely WFSX 
[ 17], to define a. fuzzy bottom-up argumentation semantics. Finally, we will solve the 
first problem of efficient computation. We will reap the benefits of using WFSX as a 
base semantics. In contrast to stable models [18], WFSX provides both a bottom-up 
fixpoint semantics and a goal-directed top-down proof procedure [17], Therefore we 




104 



R. Schweimeier and M. Schroeder 



are able to complement our declarative bottom-up argumentation semantics with an 
efficiently computable top-down proof-procedure for fuzzy argumentation. 

- Fuzzy Unification: We will introduce fuzzy unification to tackle the problem of miss- 
ing parameters and mismatching predicates and parameters in agent communication 
languages in general. While classical unification either unifies two predicates or it 
does not, fuzzy unification qualifies the degree of match. Our fuzzy unification is 
based on edit distance [19], which compares strings. To use it for unification, we will 
show how to normalise it and adapt it to tree structures of strings. We will extend edit 
distance accordingly and will prove as a general result that our fuzzy unification is a 
conservative extension of classical unification. Therefore it lends itself to integration 
for a wide variety of agent systems, which incorporate the notion of unification. As 
a particular instance, we will show how to embed fuzzy unification into the proof 
procedure and semantics for extended logic programs. 

The rest of the paper is structured as follows: We will give an overview over extended 
logic programming, well-founded semantics, and argumentation. Next, we will extend 
ELP to facilitate fuzzy reasoning in section 3. Independently, we will show in section 4 
how to define fuzzy unification and incorporate it into the well-founded semantics. We 
will conclude with a comparison to other work. 



2 Background and Definitions 

2.1 Knowledge Systems 

Knowledge representation and logic programming can seen in the broader context of 
knowledge systems [12], which are based on relational databases [20]. Wagner struc- 
tures the extensions of relational databases along three axes: deduction to cater for 
recursion, negation to capture explicitly false statements, and fuzziness to model vague- 
ness [12], [21], 

- Deduction. The standard query language for relational databases, SQL, is not Tu- 
ring-complete. In particular, it lacks recursion, and therefore concepts like the tran- 
sitive closure of a relation cannot be expressed in SQL. For this reason, relational 
databases are usually embedded in a 3-tier architecture with some high-level, Turing- 
complete programming language on top of SQL on the database-tier. In deductive 
databases, the Turing-completeness is achieved more elegantly by adding rules and 
deduction, which cater for recursion. 

- Negative information. Relational databases come with a principle of “negation by 
default” whereby a fact is assumed to be false if it is not contained in the database. 
In many circumstances, however, it is desirable to state negative facts explicitly. 
A database with explicit negative information, also called a factbase [12], consists 
of two databases: one for positive information, and one for negative information. 
This modification gives rise to two concepts absent in relational databases: 

• Undefinedness. A fact may not be contained in either the positive or the negative 
database. It is then considered unknown, or undefined. 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



105 



• Inconsistency. A fact may be contained in both the positive and the negative 
database, i.e. the database is inconsistent. In this case, one may be interested in 
methods of regaining consistency by dropping either the positive or the negative 
assumption. 

A coherence principle may be imposed to relate explicit negation and negation by 
default: if a fact is explicitly false, then it is also false by default. 

- Fuzziness. In many cases, information is not clearly defined as simply true or false. 
Fuzzy logic deals with this kind of imprecise information. There are two important 
cases of unclear information: 

• Uncertain information. For some concepts whose truth value is crisp, i.e. either 
true or false, there may only be uncertain, or statistical, or probabilistic evidence 
as to whether they are true or not. For example, a weather forecast might make 
a statement “there is a 70% chance of rain tomorrow”; the statement whether 
there is rain tomorrow is either true or false (tomorrow), but today there is no 
certain evidence for the truth of the statement. 

• Fuzzy concepts. Some concepts, such as “tall”, or “cheap”, are inherently fuzzy; 
for instance, while 7’ would be “tall” for an adult male, and 5’ would not, 6’ 
might be considered “tall” to some degree. 

These dimensions of knowledge systems have been examined in [12]. The aim of 
this article is to provide a framework for knowledge representation and reasoning en- 
compassing these dimensions, i.e. a. fuzzy deductive factbase, while bearing in mind the 
following requirements: 

- It should be characterised by a declarative fixpoint argumentation semantics. 

- There should be an efficient, goal-driven, top-down proof procedure. 

- It should handle both fuzzy reasoning and fuzzy unification. 

We will build on extended logic programming [22], [17] which combines deduction 
with the ability to model negative information, and extend it by fuzziness. There is al- 
ready an argumentation semantics for the well-founded semantics with explicit negation, 
WFSX [23], as well as an efficient proof procedure [17]. Before we proceed, we will 
review the syntax and argumentation semantics for extended logic programs. 



2.2 Extended Logic Programming and Argumentation 

Prolog [24] is the language of choice for logic programming. But many knowledge 
representation problems require a richer language capable of expressing explicitly false 
statements. An example of such an extension are extended logic programs, which provide 
two kinds of negation: Explicit negation, which denoted by -i and default negation, which 
is denoted by not . 

Definition 1. (Extended logic program) An objective literal is an atom A or its explicit 
negation ->A We define -i->L = L. A default literal is of the form not L where L is an 
objective literal. A literal is either an objective or a default literal. 

An extended logic program is a (possibly infinite ) set of rules of the form 




106 



R. Schweimeier and M. Schroeder 



L/fl i L\ , ... , L m 5 tlOt 1 ‘ • i ttOt L rn j. n (?7 7 , 77. ^ 0), 

where each Li is an objective literal fO < i < m + n], For such a rule r, we call L 0 
the head of the rule, head{r), and L\, . . . ,not L m+n the body of the rule, body(r). 
A rule with an empty body is called a fact, and we often write Lq or Lq <— true instead 
of L 0 



Example 1. Consider an information agent, which can subscribe to and unsubscribe 
mailing lists on behalf of its user. The agent holds a list of topics interesting or unin- 
teresting to the user. The user is not interested in anything outside computer science. 
Within computing, he or she likes agents and logic. The user is also not interested if the 
list contains any spam. 



{ int(agents ) 

-iint(X) <— not partO f (X , cs) sub(X) <— int(X) 

-iint(X) 7— spam(X) -> sub(X ) not int(X) 

Initially, well-founded models [25], [17] and stable models [18] have been put for- 
ward as a semantics that can deal with implicit and explicit negation. Later it turned out 
that the metaphor of argumentation [ 8 ], [5], [1], [26] is an elegant approach to capture 
the semantics of extended logic programs intuitively [1], [3], 

In general, an argument A is a proof which may use a set of defeasible assumptions. 
Another argument B may have a conclusion which contradicts the assumptions or the 
conclusions of A , and thereby B attacks A. 

Given a logic program we can define an argumentation semantics by iteratively 
collecting those arguments which are acceptable to a proponent, i.e. they can be defended 
against all opponent attacks. In fact, such a notion of acceptability can be defined in a 
number of ways depending on which attacks we allow the proponent and opponent to 
use. 

In extended logic programs [22], [17], there exist a variety of notions of attack [23] 
and consequently a variety of argumentation semantics. We define a general framework 
first, and later use the semantics which is equivalent to the well-founded semantics with 
explicit negation, WFSX [17], because there is an efficient proof procedure for it [17]. 

Our actual definition of an argument for an extended logic program is based on [4], 
Essentially, an argument is a partial proof, resting on a number of assumptions, i.e, a set 
of default literals. 

Definition 2. ( Argument ) Let P be an extended logic program. An argument for P is 
a finite sequence A = [n, . . . r n ] of ground instances of rules ry £ P such that 

- for every 1 < i < n, for every objective literal Lj in the body ofri there is a k > i 
such that head(rf) = Lj. 

The head of a rule in A is called a conclusion of A, and a default literal not L in 
the body of a rule of A is called an assumption of A. Given an extended logic program 
P, we denote the set of arguments for P by Args P . 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



107 



Example 2. (cont.) Consider the program P as defined above. Then 

A\ = [part.Of (agents, cs)\ 

A 2 = [-1 int(agents) 4— not partOf (agents, cs)] 

A 3 = [int(agents)} 

A 4 = [-1 sub(agents) 4 — not int(agents)} 

are arguments. 

There are two fundamental notions of attack: undercut , which invalidates an assump- 
tion of an argument, and rebut, which contradicts a conclusion of an argument [2], [4], 

Definition 3. (Notions of Attack) Let A\ and A 2 be arguments. Then a notion of attack 
is a binary relation between arguments. We will consider the following notions of attack. 

1. A\ undercuts A 2 if there is an objective literal L such that L is a conclusion of A\ 
and not L is an assumption of A 2 . The binary relation of undercuts is denoted by U. 

2. Ai rebuts A 2 if there is an objective literal L such that L is a conclusion of A\ and 
— 1 L is a conclusion of A 2 . 

3. A\ attacks A 2 if A\ undercuts A 2 or if A\ rebuts A 2 . The binary relation of attacks 
is denoted by a. 

Example 3. (cont.) Consider the arguments Ai,A 2 , A 3 , ,4 4 as defined above. Then A\ 
undercuts A 2 , A > rebuts A 3 and vice versa (as rebuts are symmetric), and A 3 under- 
cuts A 4 . 

Given the above notions of attack, we define acceptability of an argument. Basically, 
an argument is acceptable if it can be defended against any attack. Depending on which 
particular notion of attack we use as defence and which for the opponent’s attacks, we 
obtain a host of acceptability notions. 

Acceptability forms the basis for our argumentation semantics, which is defined as 
the least fixpoint of a function, which collects all acceptable arguments. The least fixpoint 
is of particular interest [4], [2], because it provides a canonical fixpoint semantics and it 
can be constructed inductively. 

Definition 4. (Acceptability) Let x and y be notions of attack. Let A be an argument, 
and S a set of arguments. Then A is :r/y-acceptable wrt. S if for every argument B such 
that (B, A) € x there exists an argument C £ S such that (C, B) £ y. 

Based on the notion of acceptability, we can then define a fixpoint semantics for 
arguments. 

Definition 5. (Justified Arguments) Let x and y be notions of attack, and P an extended 
logic program. The operator Fp x / y '■ F(Argsp) — > P(Argsp) is defined as 

Fp, x /y(S) — {A | A is x / y-acceptable wrt. 5} 

We denote the least fixpoint of F Px j y by Jp x j y . If the program P is clear from the 
context, we omit the subscript P. An argument A is called x/y-justified if A £ J x / y ; an 
argument is called x/y- overruled if it is attacked by an x/y-justified argument; and an 
argument is called x/y-defensible if it is neither x/y-justified nor x / y-overruled. 




108 



R. Schweimeier and M. Schroeder 



Proposition 1. For any program P, the least fbcpoint exists by the Knaster-Tarskifixpoint 
theorem [27], [28], because F Px / y is monotone. It can be constructed by transfinite 
induction as follows: 




J^ty = Fp,x/y{Jx/y) f or a + 1 a successor ordinal 
J x/y = U«<A J x/y f or ^ a limit ordinal 

Then there exists a least ordinal Ao such that F x / y (J^° y ) J X °/y Jx/y 

Example 4. (cont.) Consider the arguments A\, A 2 , A 3 , A 4 as defined above. Then 

«tf/a = ^ 

J u/ 3 = Ml! ^3} 

•^u/a = Ml M3} 

Thus Ai = [partOf (agents, cs)] and A 3 = [int(agents)] are justified. Since they 
attack A 2 = [-> int(agents) 4— not partOf (agents, cs)] and A 4 = [-1 sub(agents) 4— 
not intfagents )], these two are overruled. 

It is shown in [23] that the argumentation semantics J u / 3 (where an argument is 
acceptable if every undercut can be attacked) is equivalent to the well-founded semantics 
with explicit negation, WFSX [17]. This is of particular importance, because an efficient, 
goal-directed, top-down proof procedure, SLX [29], exists for WFSX. Because of the 
equivalence of WFSX and our argumentation semantics, this proof procedure can be 
used to compute justified arguments. When we extend our argumentation framework 
with fuzziness, we will build on SLX to obtain sound and complete proof procedures. 

3 Fuzzy Reasoning 

We will now define a logic programming language suitable for implementing a fuzzy 
deductive factbase, i.e. given a deductive factbase as sketched in the previous section, we 
extend it and move to a fuzzy deductive factbase. It has two kinds of negation, explicit 
negation for specifying falsity of a fact explicitly, and implicit negation for deriving 
information under the assumption of falsity of a fact. Uncertain information may be 
specified by assigning a fuzzy truth value to a fact. In accordance with the database 
view, we do not allow rules to have a fuzzy truth value. 

Definition 6. (Strength of a fact) Let P be an extended logic program and L 4— true 
a fact in P. Then str(L 4— true ) denotes the fuzzy truth value of the fact L, where 
0 < str(L 4— true) < 1. For convenience, we will sometimes also write L : V, where 
V = str(L 4— true). A fact with a fuzzy truth value is also called a fuzzy fact. 

Example 5. (Cont.) The user qualifies his or her interest in agents to be 80% relevant. 
Also the topic agents is not solely part of computer science and the uncertainty of spam 
is 20 %: 

int(agents) : 80% 
partOf (agents, cs) : 70% 
spam(agents) : 20% 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



109 



When defining fuzzy extended logic programming, there are three main issues: 

- How is the fuzzy truth value of negated literals defined? 

- How can the fuzzy truth value of a justified conclusion be derived in a goal-driven, 
top-down manner? 

- How can backward compatibility to existing approaches be maintained? 

Starting with the last requirement, we take the important decision to use WFSX as 
a base semantics for fuzzy argumentation, as WFSX is established and as there is a proof 
procedure for WFSX. As we shall see this decision has important implications for the 
interpretation of fuzziness and negation. 

3.1 Fuzzy Negation 

In a fuzzy version of extended logic programming, literals are assigned truth values 
in the interval [0, 1], rather than simply true (1) or false (0). To define conjunction 
and disjunction, we use the standard fuzzy definitions via minimum and maximum, 
respectively. The main problem is then to find a suitable fuzzy semantics for the two 
kinds of negation. The standard definition of fuzzy negation according to Zadeh [30] is 
given by 

A : V implies —>A : 1 — V (1) 

In our setting this does not work: WFSX is para-consistent, i.e. positive and negative 
information is independent of each other. When arguing there can be both evidence for 
and against A. Thus, it does not make sense to define the truth value of an explicit negation 
-i A by the truth value of A, as it would mean that positive information takes precedence 
over negative information. In fact, in the WFSX semantics, the only connection between 
an explicit negation -i A and A is indirect via the coherence principle, relating explicit 
and implicit negation. 

-i A implies not A (2) 



3.2 Coherence Principle 

The coherence principle of WFSX states that if there is evidence that A does not hold, 
then it should also be assumed by default that A does not hold. 

In a fuzzy setting, where formulas have truth values in [0, 1], there are various ways 
of stating the coherence principle, depending on what is understood by the statement 
“A holds”. If by “A holds”, we mean that A has a truth value greater than 0, then the 
coherence principle can be generally stated as 

->A : V, V > 0 implies not A : V', V' > V (3) 

This version of the coherence principle does not yet provide a definition of the truth 
value of an implicit negation not A from the truth value of the explicit negation ->A; 
there are two extreme cases of such a definition which comply with the fuzzy coherence 
principle (3). 



A : V, V >0 implies not A : 1 



(4) 




110 



R. Schweimeier and M. Schroeder 



This version states that implicit negation is always the extreme case of explicit 
negation: as long as there is any explicit evidence that A does not hold, then A does 
definitely not hold by default. 

-i A : V, V >0 implies not A : V (5) 

This version states a weak connection between explicit and implicit negation: if there 
is some explicit evidence that A does not hold, then A does not hold by default, with the 
same degree of evidence. 

All of the above three interpretations are valid in principle. However, there are argu- 
ments in favour of equation (4). The reason is that default negation is defined indirectly. 
Without explicit negation, positive information is given precedence over default nega- 
tion, i.e. if there is the slightest evidence for L (L : V,V >0) then not L should not 
hold ( not L : 0). Applying this principle to explicit negation one arrives at equation (4), 
i.e. the slightest evidence for ->L (~>L : V,V >0) then not L should hold ( not L : 1). 
The interpretation is in line with [21] and we will use it to define the argumentation 
semantics and proof procedure. But we will also indicate how to realise equation (5). 

Example 6. (Cont.) Since spam(agents ) : 20%, the agent can derive -i int(agents) : 
20%, despite the fact that int(agents) : 70%. Which conclusion should be drawn 
not int(agents ) and subsequently ->sub(agents)7 As explained above, explicitly ne- 
gative information takes precedence over positive information, hence avoiding spam 
gets highest priority. Following equation (4), not int(agents) : 1 and subsequently 
- >sub(agents ) : 1 is concluded, whereas equation (5), leads to not int(agents) : 20% 
and subsequently -i sub(agents ) : 20%. 

3.3 Fuzzy Argumentation 

Having made these decisions on how to interpret fuzzy negation, it is now easy to 
extend the argumentation framework with fuzzy reasoning. The existing definitions of 
arguments do not have to be adapted, we only add on top of them a definition of strength, 
which extends the fuzzy truth values given for facts to rules and arguments. 

Definition 7. (Strength) Let P be an extended logic program, let str(L i— true) be the 
strength of fuzzy facts L true £ P, and let A be an argument. Then the strength 
str(r, A) of a rule r in A is defined inductively as: 

- Fact: str{r , A) = str(L <— true), if r = L £- true is a fuzzy fact 

- Rule: str(r, A) = min{str(r q , A ), . . . , str{ri n , A)}, if 

r L i L \ , ... , L n , not L n ~ ri, • • • , not L n . 
is a rule, where ri k is the rule in A with conclusion Lt~. 

Based on the strength of a rule, we define: 

- Argument: The strength str(A) of a non-empty argument A = [ri , . . . , r n \ is defined 
as str(A) = str(r \ , A). 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



111 



- Conclusion: The strength str(L) of an objective literal L is defined as the maximum 
strength of all justified argument for L, i.e. 

str(L) = max{str(A) \ A is a justified argument for L} 



The strength of a default literal is 1 if L is not justified or equal to the maximum 
strength of any justified argument for —>L, i.e. 



str{not L ) = 



1 if L is not justified 

max{str(A) \ A is a justified argument for ~<L} else 



Example 7. (Cont.) With the above definitions, we can compute the strength of justified 
arguments (a stands for agents ). For example: 

str(^sub(a)) = max{str([-^sub(a) 4 — not int(a)])} = 1 
str(-<int(a)) = max{str([~^int(a) 4 — spam{a)\ spam(a) : 20%])} = 20% 

As argued above the definition of strength implements the interpretation of fuzzy 
negation put forward in equation (4). To implement equation (5) instead one needs to 
consider the minimal strength of all body literals of a rule. 

In this setting, fuzzy truth values are simply a layer on top of the existing non-fuzzy 
reasoning process. It is therefore straightforward to extend the proof procedure for WFSX 
[29] to compute the strength of an argument. We omit it here, and instead present the 
slightly more involved proof procedure for fuzzy unification and argumentation in detail. 



4 Fuzzy Unification 

Fuzzy reasoning is important to reason about vague and uncertain concepts and infor- 
mation. However, in open systems there is an additional problem of misunderstandings 
between agents: predicates and terms may be missing or mismatch. Fuzzy unification 
addresses this problem by introducing a degree of unification ranging from a full match 
to a complete mismatch. As such, it is a concept of value for any agent architecture 
resting on a knowledge system and communication. It can readily be integrated into any 
system which deploys for example KQML or FIPA ACL. To use our fuzzy unification, an 
agent system only needs a knowledge system, which caters for fuzziness. To show how 
this can in principle be done, we develop this concept and embed our fuzzy unification 
in the argumentation framework introduced in section 2, which in turn is an example of 
a negotiation mechanism as specified in e.g. FIPA ACL call-for-proposals speech act. 

Earlier we defined the syntax of extended logic programs resting on the notion of 
atoms. To define fuzzy unification we have to define what an atom is. 

Definition 8. (String) Symbols are strings, where a string is either the empty string e or 
a string a. A, where a is a character and A is a string. \ A\ denotes the length of A. 




112 



R. Schweimeier and M. Schroeder 



Definition 9. (Atom) Let V be a set of variable names, F a set of function symbols and 
P a set of predicate symbols. Both F and P include a unique empty function/predicate 
symbol e. 

The set of terms is defined inductively. Every variable x £ V is a term. Let f £ F be 
a function symbol ofarity n (ifn = 0, f is also called a constant) and t \, . . . , t n terms, 
then f(t i, . . . , t n ) is a term. Nothing else is a term. Let p £ P be a predicate symbol of 
arity n and t\, . . . , t n terms, then p(t\, . . . , t n ) is an atom. 

Note that our definition of an atom differs from the standard definition in that we 
include an empty function and predicate symbol. Thus there exists a empty term e. We 
treat e(/(a)) as equivalent to f(a). 

Definition 10. ( Unifier [31]) A substitution is a replacement of variables by terms. 
A substitution sub is a unifier of two literals L, L' if L sub = L' sub. A unifier sub of 
L. L' is the most general unifier (MGU) of L, L' if for every other unifier sub 1 of L, L' 
there is a substitution s, such that sub' = sub s. 

The MGU can be computed using Robinson’s unification algorithm [31]. 

Example 8. The predicates subscribe(x) and subscribe(agents) unify and the MGU is 
[x / agents]. For various reasons all of the following predicates do not unify: 

- subscribe^ ) and subscribe^ (x)) , because x occurs in /( x), which would lead to 
a circular substitution; 

- subscribe(agents) and subscribe(agents , digest ) as the predicates do not have the 
same number of parameters; 

- subscribe(agents) and subskribes(agents) as the predicate names slightly mis- 
match; 

- subscribe(agents) and sub scribe (agent) as the terms slightly mismatch. 

In classical unification predicates unify or they do not; we introduce a degree of 
unification ranging from a complete match (degree 0) as in classical unification to a 
complete mismatch (degree 1). Previous work by Arcelli, Formato, and Gerla developed 
a general abstract framework for fuzzy unification, quotient unification, and unification 
as negotiation [32], In this paper, we use an alternative approach for fuzzy unification 
based on edit distance [33]. The concept of edit distance has a well established history 
dating back to the 60s and 70s [19] and is still widely used, for example, in bioinformatics 
to compare genomic sequences. The edit distance between two strings A and B is defined 
as the minimal number of delete, add, and replace operations to convert A into B. The 
basic principle of edit distance is well-understood, but to employ it for fuzzy unification 
there are three requirements: 

First, a normalisation is required to be able to compare strings independent of their 
size. A few mismatches of short strings can be worse than some mismatches of two long 
strings. Second, the definition of edit distance has to be extended to deal with general tree 
structures representing the predicates and terms to be compared. Third, for compatibility 
reasons fuzzy unification should be an extension of classical unification. 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



113 



4.1 Edit Distance 

In this section, we set out to broaden the principles of unification to encompass fuzzy 
matches of predicate and function symbols and to deal with mismatching arguments. We 
need a comparison measure to define how similar two symbols are. As argued above, 
edit distance is a suitable basis for this purpose. Alternatively to defining edit distance as 
the minimal number of add, delete, and replace operations to transform one string into 
another, one can define it recursively comparing two strings by either dropping one of 
the two or both first characters of the strings at a penalty of 1 or to drop the two with no 
penalty if they match. 

Definition 11. (Edit Distance) Let A, B be strings and a, b characters, then 
e(A,e) = e(e,A) = \A\ 

e(a.A, b.B) = min{e{A , b.B) + 1, e(a.A, B ) + 1, 
e(A, B) + 1, e(A, B) if a = b } 



Example 9. e(subscribe, subskribes) = 2 and e{sub , abb) = 2. 



Although the first example has seven letters in common and the second only one, 
both edit distances amount to 2. Therefore, there is a need to normalise edit distance to 
judge the penalties for mismatches relative to the size of the strings. Such a normalised 
edit distance should range between 0 (no matches) to 1 (no mismatches). 



Definition 12. (Normalised Edit Distance ) Let A, B be strings and at least one of them 
non-empty, then 



ne(A , B) 



e(A,B) 
max(\A\, \B\) 



is the normalised edit distance. 



Example 10. With normalisation, we obtain 



ne(subscribe , subskribes ) = — = 0.2 and ne{sub , abb) 



2 

3 



= 0.6 



4.2 Fuzzy Unification 

While normalised edit distance is well suited to compare symbols, we want to deal with 
predicates and terms, which have a tree structure. Therefore, we have to extend our 
definition. It is very important that for the purpose of comparison there is no difference 
between a tree structure of a predicate and of terms. Hence, we do not distinguish between 
predicate and function symbols, and in the remainder t often denotes both a predicate 
or a term. Please note also that we include the empty symbol e as predicate or function 
symbol and we do not distinguish between e(£) and t for a term t. 

To define fuzzy unification, we have to recursively traverse the tree representing the 
predicates and terms. In Definition 13 of edit distance over trees et, the first returned 




1 14 R. Schweimeier and M. Schroeder 

parameter is the number of mismatches, the penalty; the second is the accumulated 
substitution; the third is a factor for normalisation: the sum of the maximal nodes of the 
pairwise node comparisons in the recursive traversal. But let us consider this recursive 
definition in detail: Any term perfectly mismatches the empty symbol, which is penalised 
with the maximum value - the size of the term. Two variables as well as a variable and 
a term perfectly match, which is captured by a fuzzy factor of 0 and the corresponding 
substitutions. Note that for the latter an occurs check is performed. Predicate or function 
symbols do not contain any further structure and therefore their fuzzy unification factor is 
given by the edit distance e. For the purpose of normalisation we use here the maximum 
length of the two symbols. In the core of the definition, we reduce two predicates or terms 
t, t' and call the edit distance over trees recursively. To the edit distance of the leading 
predicate or function symbol we add the minimum distance after dropping the first term(s) 
and adding the penalty of the dropped term(s). Thus, the definition compares terms from 
left to right dropping terms of either t,t',or both t and t! . The result of this decompositions 
are added up using ®, which adds numbers and concatenates substitutions. 

Definition 13. (Fuzzy Unification) Let t = f{t \ , ... . , t n ) and t' = f(t ' 1: . . . , t' m ) be 
two terms or predicates, and let x,y € V be variables. The size of a term or predicate 
is defined as: size(x) = size(e) = 0, size(f) — \f\, and size{f{t \, . . . , t n )) = 
I/I +Ei=i,...,n size[ti). 

The edit distance over trees et maps two terms or predicate to a tuple of the number 
of mismatches, a unifier, and a normalisation factor 

et(t, e) = ( size(t ), [], size(t)) 
et(x, y) = (0, [x/y\, 0) 
et(x, t ) = (0, [x/t], 0) ifx not in t and t V 
= WJ'),W,max{\f\,\f\}) 
et(t, t') = et(f , f) ® min v { 

et((t 2 , ■ . • ,t n ), (t'r, . . . , t' m )) ® et(t\, e), 
et((h ,. . . , t n ), (t' 2 , . . . , t' m )) ® eb{if x ,e), 
et({t 2 , ■ ■ ■ , t n ), (4, . . . , t' m )) ® et(t i, t'i)} 

where (u, s, n) ® (u 1 , s ' , n') = (u + u', s s' ,n + if) and min u returns the triple with 
minimal first component. 

et is called edit distance over trees. The normalised edit distance over trees netft, t') = 
(^,s) with ( u,s,n ) = et(t,t') is called fuzzy unification. For convenience, we often use 
net to refer only to its first component. 

Example 11. Consider example 8, where unification failed because of mismatching 
predicate and function symbols or missing parameters. With fuzzy unification, we obtain 

6 2 

net( subscribe^ agents ) , sub scribei agents , digest) ) = = - = 0.29 

9+6+6 7 

as the argument digest cannot be matched, 

2 1 

net( subscribe(agents ) , subskribes(agents) ) = — ^ = - = 0.125 



as the predicate names mismatch; 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



115 



net( subscribe(agents ) , subscribe(agent) ) = ^ ^ = 0.06 

as the terms slightly mismatch. 

Fuzzy unification lifts the normalisation by maximum size of the compared strings 
as introduced for the simple edit distance to the level of terms and predicates with 
a tree representation. An alternative to adding all mismatches and then normalising 
by the pairwise maximum length of the compared nodes is a direct normalisation of 
compared nodes using ne and then redefining ® to take the average. This has however 
the disadvantage of favouring short mismatches of parameters (see e.g. example 9, 10), 
which our above definition does not suffer from. 

With the definition of net in place we can prove some of its properties. 

Theorem 1. Fuzzy unification is a conservative extension of unification, i.e. if s is an 
MGU for literals L, L' , then (0, s) is a fuzzy unifier for L , L' . 

Proof ( Sketch ). Literals L and L' unify iff all predicate and function symbols all variables 
in L and L' respectively have edit distance 0. 



4.3 Fuzzy Unification and Argumentation 

To embed fuzzy unification into our argumentation framework we have to replace uni- 
fication by fuzzy unification. Unification appears at two stages: once when rules are 
“chained” together to form an argument and once when arguments attack each other. 
Let us consider this in detail. 

- Arguments as introduced in Definition 2 are partial proofs, where for every objective 
literal L in the body of a rule rj there is a k > i such that L = head(rk). This 
unification of L and the head of ry c needs to be replaced by fuzzy unification. 

- Attacks between arguments require the literals involved to unify. We will relax this 
requirement to fuzzy unification. 



Definition 14. ( Fuzzily Unified Argument) Let P be an extended logic program, and 
0 < U < 1 a maximum unification value, which needs to be met for two literals to unify 
(U = 0 is a complete match, U = 1 a complete mismatch ). A fuzzily unifying argument 
for P of maximum unification value U is a finite sequence A = [n, . . . r n \ of ground 
instances of rules r* £ P such that 

- for every 1 < i < n, for every objective literal L in the body ofri there is a k > i 
such that net(L,head(rk)) < U. 

We say that A’s unification value is U' , where U' < U is the minimal maximal 
unification value for which A still fuzzily unifies. 




116 



R. Schweimeier and M. Schroeder 



Example 12. (Cont.) Consider the program 

subscribe(agent) 4— inter est(agents) . inter esting(agents) . 

Let the maximal unification value be U = 0.01, then there is no fuzzily unifying 
argument. If we increase this value to U = 0.2 e.g., then 

A = [subscribe(agent) 4— interest{agents)\interesting{agents)\ 

is a fuzzily unifying argument. Indeed, A has unification value U' = 0.177. 

Since the only difference between an argument as defined in Definition 2 and a fuzzi- 
ly unified argument is the replacement of unification by fuzzy unification, we will use 
the two terms interchangeably. This applies in particular to all definitions of section 2 
using arguments. Next, we have to adapt the notions of attack. 

Definition 15. ( Fuzzily Unifying Attacks ) Let P be an extended logic program, and 
0 < U < 1 a unification value. Let A\ and A 2 be arguments, then 

- A\ fu-undercuts A 2 iff A\ has a consequent L and A 2 has an assumption not L' , 
and net{L , L') < U. The binary relation of fu-undercuts is denoted by fu-u. 

- Ai fu-rebuts A 2 iff A-\ has a consequence L and A 2 has a consequence —>L', and 
net{L , L') < U. 

- Ai fu-attacks A 2 iff Ai fu-undercuts or fu-rebuts A 2 . The binary relation of fu- 
attacks is denoted by fu-a. 



Example 13. Let if = 0.1 be a unification value, then argument 

A = [subscribe(agents) <— inter est(agents), not spam(agents)\ inter est(agents )] 

is fu-undercut by the argument B = [spam(agent)\, since 

net(spam(agents) , spamfagent)) = ^ ^ = 0.1 

If U is lowered to, say, 0.01, the undercut does not apply any longer. 

With the definition of arguments and notions of attack adapted to fuzzy unification 
all other definitions (acceptability, justified argument) stay the same. Thus we have 
a declarative definition for an argumentation process with fuzzy unification. In the next 
section we adapt the proof procedure accordingly. 

4.4 Proof Procedure 

We modify the proof procedure for WFSX [29], [17] to compute fu-u/fu-a-justified 
arguments. The proof procedures is based on trees: in a T-tree a literal is proved true, 
while in a TU-tree a literal is proved not false. The modified T/TU-trees need to capture 
the fuzzy unification of literals. We achieve this by labelling nodes with pairs of literals 
(L, U ) , where L is the literal we are seeking to expand and If is a suitable matching 
literal, so that the fuzzy unification of L and L' is less than the required maximum. 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



117 



Definition 16. ( Fuzzily Unifying T-tree, TU-tree) Let P be a ground extended logic 
program and 0 < U < 1 a unification value. A fuzzily unifying T-tree (resp. TU-tree) of 
maximum unification value U for a literal L is an and-tree whose nodes are labelled by 
pairs of literals (JJ , L"), where L" is undefined if the node is a leaf; the first component 
of the label of the root is L. Fuzzily Unifying T-trees (resp. TU-trees) are constructed 
top-down starting from the root by successively expanding new nodes using the following 
rules: 

1. If n is a node whose first label is an objective literal L, then 

- if there is a rule L' <— . . . , L m , not L m+ 1 , . . . , not L n in P such that 

net{L,L') < U, then 

• n is labelled (L, L') and 

• the successors of n are nodes with first label 

(L i, . . . , L m , not L m+ 1 , . . not L n ) in a fuzzily unifying T-tree, 

• while in a fuzzily unifying TU-tree there are, additionally, the successor 
nodes with first label ( not ~<Li , . . . , not ~<L m ) 

- If no rule for L' with net(L,L') < U exists, then n is a leaf with the second 
label undefined. 

2. Nodes whose first labels are default literals are leaves, and their second label is 
undefined. 

Definition 17. (Successful or failed tree) A fuzzily unifying T- or TU-tree with maximum 
unification value 0 < U < 1 is either successful with unification value U < U', or it 
fails. All infinite trees are failed. A tree is successful with unification value U' (resp. 
failed ) if its root is successful with unification value U' (resp. failed). Nodes are marked 
as follows: 

1. A leaf whose first label is an objective literal, and whose second label is undefined 
is failed. 

2. A leaf labelled with a default literal not L is successful with unification value 0 in 
a fuzzily unifying T-tree (TU-tree) if 

a) all fuzzily unifying TU-trees (T-trees) for L are failed, or 

b) if there is a successful fuzzily unifying T-tree for —>L. 

Otherwise it is labelled as failed. 

3. An intermediate node n of a fuzzily unifying T-tree (TU-tree) is successful with 
unification value U' if its children t \, . . . , t m are successful with unification values 
U [ , . . . , U' m , and U' = max{U [, . . . , U' m }. It is failed otherwise. 

All remaining nodes are labelled failed in fuzzily unifying T-trees and successful with 
unification value 0 in fuzzily unifying TU-trees. 

The following theorem states that the proof procedure is sound and complete. 

Theorem 2. T/TU -trees and justified arguments. Let 0 < U < 1, P a ground, possibly 
infinite, extended logic program, and L an objective literal. Then 

- There exists a successful fuzzily unifying T-tree for L with maximal unification value 
U iff there exists a justified fuzzily unifying argument for L with maximal unification 
value U. 




118 



R. Schweimeier and M. Schroeder 



- There exists a successful fuzzily unifying TU-tree for L with maximal unification 
value U iff there exists a fuzzily unifying argument for L with maximal unification 
value U, which is not overruled. 

Proof (Sketch). First, note that fuzzily unifying T-trees with root labelled L are in one- 
to-one correspondence with arguments for L; fuzzily unifying TU-trees for L are in 
one-to-one correspondence with arguments for L, where in each rule each body literal 
TJ is complemented by not^L'. For successful trees, the unification value of the tree is 
equal to the unification value a justified argument. 

We define the rank of a fuzzily unifying T/TU-tree as the number of alterations 
between T-trees and TU-trees in an attempt to show that the tree is successful. Similarly, 
we define two kinds of rank for arguments: the T-rank of an argument is defined as the 
number of undercuts and counter-attacks in establishing that an argument is justified. 
The TU-rank of an argument is defined as the number of attacks and counter-undercuts 
in establishing that an argument is defensible. Note that ranks of trees and arguments 
may be infinite. 

The proof is then by induction on the rank of a tree, showing that successful fuzzily 
unifying T-trees of rank n correspond to justified arguments of T-rank n, and successful 
TU-trees of rank n correspond to arguments of TU-rank n which are not overruled. 

A successful fuzzily unifying T-tree of rank n depends on the failure of fuzzily 
unifying TU-trees of rank < n; these correspond exactly to the undercuts (of rank < n) 
to the corresponding argument (of rank n); this is equivalent to saying that all undercuts 
are overruled (by induction hypothesis), i.e. the argument is justified. 

Similarly, a successful fuzzily unifying TU-tree of rank n depends on the failure of 
fuzzily unifying T-trees of rank < n, corresponding exactly to undercuts and rebuts (of 
rank < n ) ; this is equivalent to saying that no attack is justified (by induction hypothesis), 
i.e. the argument is not overruled. 

Flaving established the relationship between the declarative argumentation semantics 
and the operational characterisation by proof trees, we can now turn to the important 
result that both are conservative extensions of th well-founded semantics for extended 
logic programs [17]: 

Theorem 3. There exists a successful T-tree (TU-tree) for L iff there exists a successful 
fuzzily unifying T-tree (TU-tree) for L of maximal unification value (7 = 0. 

Proof. Follows immediately from the definition of justified arguments with fuzzy uni- 
fication, and from Theorem 1 . 

5 Comparison 

We have presented a framework for fuzzy unification and argumentation, which caters 
for both a declarative and an operational semantics, which provides expressive know- 
ledge representation with explicit negative information and fuzzy values, and which 
uses the latter to cope with mismatches of parameters and missing parameters, which 
is vital for open agent systems. Our argumentation framework relates to two strands of 
research, namely argumentation as a paradigm for logic programming semantics and 
argumentation as a paradigm for negotiating agents. 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



119 



Argumentation and logic programming: Regarding logic programming our approach 
is based on earlier work by Bondarenko et. al. [1], Dung [2], Prakken and Sartor [4], 
and Pereira et. al. [17]. A variety of argumentation semantics for logic programming 
with default negation was proposed by [ 1 ] and related to existing semantics. This work 
was applied to extended logic programming by [2], [4], In [23], a general hierarchy of 
different argumentation semantics for extended logic programming is presented, and 
a particular argumentation semantics equivalent to WFSX [17] is identified. It is this 
semantics which we base the present work on, mainly because there exists an efficient 
proof procedure for WFSX [29]. 

Fuzzy unification: Our fuzzy unification is closely related to Arcelli, Formato, Ger- 
la [32], who develop an abstract framework for fuzzy unification and resolution. There 
are important differences: First, [32] does not allow unification of predicates of different 
arity, which is however a problem often occurring in Prolog programming [34]. Second, 
[32] is not an extension of classical unification, which is important for compatibility 
reasons. Third, our work is based on a specific similarity measure, namely edit distance 
[19]. For our interpreter we need a normalised edit distance over trees. Although there has 
been some work on normalised edit distance [35], we had to modify this work because 
it does not deal with tree structures. 

Argumentation for negotiating agents: A number of authors [5], [6], [7], [8], [9], 
[10], [11] work on argumentation for negotiating agents. In line with [9] and unlike [6], 
[7], [8] we base our work on logic programming. A particular aim of our work is to show 
how to define arguing agents with a rich knowledge representation language and still be 
able to define goal-directed, top-down proof procedures. This is vital when implement- 
ing systems, which need to react in real-time and therefore cannot afford to compute all 
justified arguments, as would be required when a bottom-up argumentation semantics 
would be used. 

As in [ 1 1], our fuzzy argumentation framework can be extended to reasoning with 
multiple agents by introducing multiple contexts. 

6 Conclusion 

To summarise, the main contributions of this paper are as follows: 

- We designed a fuzzy argumentation framework which caters for an expressive knowl- 
edge representation including explicit and default negation and fuzzy truth values. 
We discussed the problem of defining fuzzy truth values in the light of negation 
and chose an interpretation, which allowed us to conservatively extend WFSX. As 
argued in the introduction, explicit negation is e.g. required in KQML and the abil- 
ity to deal with uncertainty in FIPA. Thus our framework is a good basis for agent 
communication languages. 

- We developed fuzzy unification to deal with a problem in open systems: how to 
interact in the light of missing parameters and mismatches in parameters and pre- 
dicates. We realised fuzzy unification as normalised edit-distance over trees. Our 
approach is a conservative extension of classical unification and thus lends itself to 




120 



R. Schweimeier and M. Schroeder 



be easily incorporated into any open agent system, which uses unification. As an 
example, we showed how to incorporate it into the above argumentation framework, 
by conservatively extending WFSX. 

- Our argumentation frameworks provides both a sound theoretical basis and an ef- 
ficient implementation. The former is achieved through a declarative bottom-up 
fixpoint semantics, the latter through a goal-directed, top-down proof procedure. 



Acknowledgment. We would like to thank David Gilbert for discussion on fuzzy uni- 
fication. 



References 

1. Bondarenko, A., Dung, P„ Kowalski, R., Toni, F.: An Abstract, Argumentation-Theoretic 
Approach to Default Reasoning. Artificial Intelligence 93 (1997) 63-101 

2. Dung, P.M.: An Argumentation Semantics for Logic Programming with Explicit Negation. 
In: Proc. of the 10th International Conference on Logic Programming, MIT Press (1993) 
616-630 

3. Dung, P.M.: On the Acceptability of Arguments and Its Fundamental Role in Nonmonotonic 
Reasoning, Logic Programming and n-Person Games. Artificial Intelligence 77 (1995) 321 — 
357 

4. Prakken, H., Sartor, G.: Argument-Based Extended Logic Programming with Defeasible 
Priorities. Journal of Applied Non-Classical Logics 7 (1997) 25-75 

5. Kraus, S., Sycara, K., Evenchik, A.: Reaching Agreements through Argumentation: A Logical 
Model and Implementation. Artificial Intelligence 104 (1998) 1-69 

6. Parsons, S., Jennings, N.: Negotiation through Argumentation-a Preliminary Report. In: Proc. 
Second Int. Conf. on Multi-Agent Systems, Kyoto, Japan (1996) 267-274 

7. Sierra, C., Jennings, N„ Noriega, P., Parsons, S.: A Framework for Argumentation-Based 
Negotiation. In: Proc. Fourth Int. Workshop on Agent Theories, Architectures and Languages 
(ATAL-97), Springer- Verlag (1997) 167-182 

8. Parsons, S., Sierra, C., Jennings, N.: Agents that Reason and Negotiate by Arguing. Journal 
of Logic and Computation 8 (1998) 261-292 

9. Sadri, F., Toni, F., Torroni, P.: Logic Agents, Dialogue, Negotiation - An Abductive Approach. 
In: Proceedings of the AISB Symposium on Information Agents for E-commerce (2001) 

10. Torroni, P.: A Study on the Termination of Negotiation Dialogues. In: Proceedings of Au- 
tonomous Agents and Multi Agent Systems 2002, ACM Press (2002) 1223-1230 

1 1 . Schroeder, M. : An Efficient Argumentation Framework for Negotiating Autonomous Agents. 
In: Proceedings of Modelling Autonomous Agents in a Multi-Agent World MAAMAW99, 
LNAI 1647, Springer- Verlag (1999) 

12. Wagner, G.: Foundations of Knowledge Systems with Applications to Databases and Agents. 
Kluwer Academic Publishers (1998) 

13. Schroeder, M., Wagner, G.: Vivid Agents: Theory, Architecture, and Applications. Journal of 
Applied Artificial Intelligence 14 (2000) 64-5-676 

14. Finin, T.. Fritzson, R., McKay, D., e, R.M.: KQML as an Agent Communication Lanu- 
gage. In: Proceedings of the Third International Conference on Informati on and Knowledge 
Management (CIKM’94), ACM Press (1994) 456-463 

15. Chiariglione, L., et al.: Specification Version 2.0. Technical Report, Foundations of Intelligent 
Physical Agents (1997) http : //www. f ipa. org 




Fuzzy Unification and Argumentation for Well-Founded Semantics 



121 



16. Nwana, FI.. Ndumu, D.: A Perspective on Software Agents Research. The Knowledge 
Engineering Review 14 (1999) 125-142 

17. Alferes, J.J., Pereira, L.M.: Reasoning with Logic Programming. (LNAI 1 1 1 1), Springer- 
Verlag (1996) 

18. Gelfond. M., Lifschitz, V.: The Stable Model Semantics for Logic Programming. In: Kowal- 
ski, R.A., Bowen, K.A. (eds.): 5th International Conference on Logic Programming. MIT 
Press (1988) 1070-1080 

19. Levenshtein, V.: Binary Codes Capable of Correcting Deletions. Insertions, and Reversals. 
Doklady Akademii nauk SSSR (in Russian) 163 (1965) 845-848 Also in Cybernetics and 
Control Theory 10 8 (1996) 707-710 

20. Codd, E.F.: A Relational Model of Data for Large Shared Data Banks. Communications of 
the ACM 13 (1970) 377-387 

21. Wagner, G.: Negation in Fuzzy and Possibilistic Logic Programs. In: Logic Programming 
and Soft Computing. Research Studies Press (1998) 

22. Gelfond. M., Lifschitz, V.: Logic Programs with Classical Negation. In: Proc. of ICLP90, 
MIT Press (1990) 579-597 

23. Schweimeier, R., Schroeder, M.: A Parametrised Hierarchy of Argumentation Semantics for 
Extended Logic Programming and Its Application to the Well-Founded Semantics. Theory 
and Practice of Logic Programming (2004) To appear. 

24. Sterling. L., Shapiro, E.: The Art of Prolog. MIT Press (1986) 

25. van Gelder. A., Ross, K.A., Schlipf, I.S.: The Well-Founded Semantics for General Logic 
Programs. lournal of the ACM 38 (1991) 620-650 

26. Chesnevar, C.I., Maguitman, A.G., Loui. R.P. : Logical Models of Argument. ACM Computing 
Surveys 32 (2000) 337-383 

27. Tarski, A.: A Lattice-Theoretical Fixpoint Theorem and Its Applications. Pacific Journal of 
Mathematics 5 (1955) 285-309 

28. Birkhoff, G.: Lattice Theory. 3rd edn. American Mathematical Society (1967) 

29. Alferes, J.J., Damasio, C.V., Pereira, L.M.: A Logic Programming System for Non-Monotonic 
Reasoning. Journal of Automated Reasoning 14 (1995) 93-147 

30. Zadeh. L.: A Theory of Approximate Reasoning. Machine Intelligence 9 (1979) 149-194 

31. Robinson, J.A.: A Machine Oriented Logic Based on the Resolution Principle. Journal of the 
ACM 12 (1965) 23-42 

32. Fontana, F.A., Formato, F., Gerla, G.: Fuzzy Unification as a Foundation of Fuzzy Logic 
Programming. In: Logic Programming and Soft Computing. Research Studies Press (1998) 
51-68 

33. Gilbert, D., Schroeder, M.: FURY : Fuzzy Unification and Resolution Based on Edit Distance. 
In: Proc. 1st International Symposium on Bioinformatics and Biomedical Engineering (2000) 
330-336 

34. Fung, P., Brayshaw. M., duBoulay. B., Elsom-Cook, M.: Towards a Taxonomy of Misconcep- 
tions of the Prolog Interpreter. In: Brna, P., du Boulay, B., Pain. H. (eds.): Learning to Build 
and Comprehend Complex Information Structures: Prolog as a Case Study. Ablex (1999) 

35. Vidal, E., Marzal, A.. Aibar, P: Fast Computation of Normalized Edit Distances. IEEE 
Transactions on Pattern Analysis and Machine Intelligence 17 (1995) 899-902 




Tree Signatures and Unordered XML Pattern Matching 



Pavel Zezula 1 , Federica Mandreoli 2 , and Riccardo Martoglia 2 



1 Masaryk University, Brno, Czech Republic 
zezulaOf i . muni . cz 

2 University of Modena and Reggio Emilia, Modena, Italy 
{mandreoli .federica, martoglia.riccardo}@unimo . it 



Abstract. We propose an efficient approach for finding relevant XML data twigs 
defined by unordered query tree specifications. We use the tree signatures as the 
index structure and find qualifying patterns through integration of structurally 
consistent query path qualifications. An efficient algorithm is proposed and its 
implementation tested on real-life data collections. 



1 Introduction 

With the rapidly increasing popularity of XML for data representation, there is a lot of 
interest in query processing over data that conform to the labelled-tree data model. The 
idea behind evaluating tree pattern queries, sometimes called the twig queries, is to find 
all existing ways of embedding the pattern in the data. Since XML data collections can 
be very large, efficient evaluation techniques for tree pattern matching are needed. 

From the formal point of view, XML data objects can be seen as ordered labelled 
trees. Following this model, previous approaches considered also the query trees ordered, 
so the problem can be characterized as the ordered tree pattern matching. Though there 
are certainly situations where the ordered tree pattern matching perfectly reflects the 
information needs of users, there are many other that would prefer to consider query 
trees as unordered. For example, when searching for a twig of the element person with 
the subelements first name and last name (possibly with specific values), ordered 
matching would not consider the case where the order of the first name and the last 
name is reversed. However, this could exactly be the person we are searching for. The 
way to solve this problem is to consider the query twig as an unordered tree in which each 
node has a label and where only the ancestor-descendant relationships are important - 
the preceding-following relationships are unimportant. 

In general, the process of unordered tree matching is difficult and time consuming. 
For example, the edit distance on unordered trees was found in [5] NP hard. To improve 
efficiency, an approximate searching for nearest neighbors, called ATreeGrep, was pro- 
posed in [3]. However, the problem of unordered twig pattern matching in XML data 
collections has not been studied, to the best of our knowledge. 

In this paper we propose an efficient evaluation of the unordered tree matching. We 
use the tree signature approach [4], which has originally been proposed for the ordered 
tree matching. In principle, we decompose the query tree into a collection of root to leaf 
paths and search for their embedding in the data trees. Then we join the structurally 
consistent path qualifications to find unordered query tree inclusions in the data. 



P. Van Emde Boas et at. (Eds.): SOFSEM 2004, LNCS 2932, pp. 122-139, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Tree Signatures and Unordered XML Pattern Matching 



123 



The rest of the paper is organized as follows. In Section 2, we summarize the concepts 
of tree signatures and define their properties that are relevant towards our objectives. In 
Section 3, we analyze the problem of unordered tree matching, and in Section 4 we 
propose an efficient algorithm to compute the query answer set. Performance evalua- 
tions are presented in Section 5. Final conclusions are in Section 6. 

2 Tree Signatures 

The idea of tree signatures proposed in [4] is to maintain a small but sufficient represen- 
tation of the tree structures able to decide the ordered tree inclusion problem for the XML 
data processing. As a coding schema, the preorder and postorder ranks [1] are used. In 
this way, tree structures are linearized, and extended string processing algorithms are 
applied to identify the tree inclusion. 

An ordered tree T is a rooted tree in which the children of each node are ordered. 
If a node v £ T has k children then the children are uniquely identified, left to right, as 
. . . ,ik- A labelled tree T associates a label (name) t v £ £ (the domain of tree 
node labels) with each node v £ T. If the path from the root to v has length n, we say 
that the node v is on the level n, i.e. level{y) = n. Finally, size(v ) denotes the number 
of nodes rooted at v - the size of any leaf node is zero. In this section, we consider 
ordered labelled trees. 

The preorder and postorder sequences are ordered lists of all nodes of a given tree T. 
In a preorder sequence, a tree node v is traversed and assigned its (increasing) preorder 
rank, pre(v), before its children are recursively traversed from left to right. In the pos- 
torder sequence, a tree node v is traversed and assigned its (increasing) postorder rank, 
post(y), after its children are recursively traversed from left to right. For illustration, see 
the preorder and postorder sequences of our sample tree in Fig. 1 - the node’s position 
in the sequence is its preorder/postorder rank, respectively. 



a 

»/ \ 
b f 
4-\ \ 

eg h 

S \ v/ \ 

d e o p 

pre : abedegfhop 
post : deegbophf a 
rank :123456789 10 



Fig. 1 . Preorder and postorder sequences of a tree 

Given a node v £ T with pre(v ) and post(v ) ranks, the following properties are 
important towards our objectives: 




124 



P. Zezula, F. Mandreoli, and R. Martoglia 



- all nodes x with pre(x) < pre(y) are the ancestors or preceding nodes of v; 

- all nodes x with pre(x ) > pre(v ) are the descendants or following nodes of v; 

- all nodes x with post(x) < post(v ) are the descendants or preceding nodes of v; 

- all nodes x with post(x) > post(v ) are the ancestors or following nodes of v, 

- for any v € T, we havepre(?;) — post(v) + size(v) = level(v); 

- if pre{v) = 1, v is the root, if pre{v) = n,v is a leaf. For all the other neighboring 
nodes v-i and V{ + i in the preorder sequence, if post(vi+i) > post(vi), V{ is a leaf. 

As proposed in [2], such properties can be summarized in a two dimensional dia- 
gram. See Fig. 2 for illustration, where the ancestor (A), descendant (D), preceding (P), 
and following (F) nodes of v are strictly located in the proper regions. Notice that in 
the preorder sequence all descendant nodes (if they exist) form a continuous sequence, 
which is constrained on the left by the reference node v and on the right by the first 
following node of v (or by the end of the sequence). The parent node of the reference is 
the ancestor with the highest preorder rank, i.e. the closest ancestor of the reference. 



post 







A 


F ! 




v ‘ 


P 


d ; 



n pre 



Fig. 2. Properties of the preorder and postorder ranks 



2.1 The Signature 

The tree signature is a list of entries for all nodes sorted in acceding preorder. In addition 
to the node name, each entry also contains the node’s position in the postorder sequence. 



Definition 1. Let T be an ordered labelled tree. The signature of T is a sequence, 
sig(T) = (ti,post(ti)\ t 2 ,post(t, 2 )\ ■ ■ ■ t n ,post(t n )), of n = |T| entries, where ti is a 
name of the node with preitf) = i. The post(tf) value is the postorder value of the node 
named ti. 

Observe that the index i in the signature sequence is the preorder rank of ti , so the value 
of i serves actually two purposes. In the following, we use the term preorder if we mean 
the rank of the node. When we consider the position of the node’s entry in the signa- 
ture sequence, we use the term index. For example, (a, 10; b, 5; c, 3; d, 1; e, 2; g , 4; /, 9; 
h, 8; o, 6; p, 7) is the signature of the tree from Fig. 1 . The first signature element a is the 
tree root. Leaf nodes in signatures are all nodes with postorder smaller than the postorder 
of the following node in the signature sequence, that is nodes d, e, g. o - the last node, 
in our example it is the node p, is always a leaf. We can also easily determine the level 
of leaf nodes, because the size(U) = 0 for all leaves f, thus level(ti ) = * — post(ti). 




Tree Signatures and Unordered XML Pattern Matching 



125 



Extended Signatures. By extending entries of tree signatures with two preorder num- 
bers representing pointers to the first following, ff, and the first ancestor, fa, nodes, 
the extended signatures are also defined in [4]. The generic entry of the ?'-th extended 
signature is ( U,post(U ), ff it faf). Such version of the tree signatures makes possible 
to compute levels for any node as leveKfi) = //, — post(ti ) — 1, because the cardinality 
of the descendant node set can be computed as: sizeftf) = ff i — i — 1. For the tree in 
Fig. 1, the extended signature is: sig(T) = (a, 10, 11, 0; b, 5, 7, 1; c, 3, 6, 2; d, 1, 5, 3; 
e, 2, 6, 3; g, 4, 7, 2; /, 9, 11, 1; h, 8, 11, 7; o, 6, 10, 8; p, 7, 11, 8). 



Sub-Signatures. A sub-signature subsigsiT ) is a specialized (restricted) view of T 
through signatures, which retains the original hierarchical relationships of elements 
in T. Specifically, subsigs(T ) = (t Sl ,post(t Sl )-,t S2 ,post(t S2 )-, . . ,t Sn ,post(t Sn )) is 
a sub-sequence of sig(T), defined by the ordered set S = {si, S 2 , . . . s*,} of indexes 
(preorder values) in sig(T), such that 1 < Si < s 2 < . . . < Sk < n. Naturally, the set 
operations of the union and the intersection can be applied on sub-signatures provided 
the sub-signatures are derived from the same signatures and the results are kept sorted. 
For example, consider two sub-signatures of the signature representing the tree in Fig. 1, 
defined by ordered sets Si = {2, 3, 4} and S 2 = {2, 3, 5, 6}. The union of Si and S 2 
is the set {2, 3, 4, 5, 6}, that is the sub-signature representing the subtree rooted at the 
node b of our sample tree. 



2.2 Ordered Tree Inclusion Evaluation 

Let D and Q be ordered labelled trees. The tree Q is included in D, if D contains all 
elements (nodes) of Q and when the sibling and ancestor relationships of the nodes in 
D are the same as in Q. Using the concept of signatures, we can formally define the 
ordered tree inclusion problem as follows. Suppose the data tree D and the query tree 
Q specified by signatures 

sig(D) = (di,post(di); d 2 ,post(d 2 ); ■ ■ ■ d m ,post(d m )), 



sig{Q) = {qi,post{qi)\ q 2 ,post{q 2 ); . • . q n ,post{q n )) . 

Let subsigs(D) be the sub-signature (i.e. a subsequence) of sig(D) induced by a name 
sequence-inclusion of sig(Q) in sig(D) - a specific query signature can determine zero 
or more data sub-signatures. Regarding the node names, any subsigs(D) = sig(Q), 
because qi = d Si for all i, but the corresponding entries can have different postorder 
values. The following lemma defines the necessary constraints for qualification. 

Lemma 1. The query tree Q is included in the data tree D if the following two conditions 
are satisfied: (1) on the level of node names, sig(Q) is sequence-included in sig(D) 
determining subsigs{D) through the ordered set of indexes S = {si, . . . , s n }, (2) for 
all pairs of entries i and j in sig(Q) and subsigs(D), i,j = 1,2, .. . |Q| — 1 and 
i + j < \Q\, post(qi + j) > post(qi) implies post(d Si+j ) > post(d Si ) and post(qi + j) < 
post(qi) implies post(d Si+j ) < post(d Si ). 




126 



P. Zezula, F. Mandreoli, and R. Martoglia 



Proof. Because the index i increases according to the preorder sequence, node i + j 
must be either the descendant or the following node of i. If post(qi+j) < post(qf), the 
node i+ j in the query is a descendant of the node i, thus also post(d Si+j ) < post(d Si ) is 
required. By analogy, if post(qi+j ) > post{qf), the node i + j in the query is a following 
node of i, thus alsoposf(d Si+j ) > post(d Si ) must hold. 

Observe that Lemma 1 defines a weak inclusion of the query tree in the data tree, 
meaning that the parent-child relationships of the query are implicitly reflected in the 
data tree as only the ancestor-descendant. However, due to the properties of preorder 
and postorder ranks, such constraints can easily be strengthened, if required. 

For example, consider the data tree D in Fig. 1 and the query tree Q in Fig. 3. Such 



h 

S \ 

o p 



sig{Q) = (h, 3; o, l;p,2) 



Fig. 3. Sample query tree Q 



query qualifies in D: sig(Q) = ( h , 3; o, 1 ;p, 2) determines subsigs(T) = (h, 8 ; o, 6 ; 
p , 7) through the ordered set S = {8, 9, 10} because ( 1 ) qq = d 8 , 92 = <h, and <73 = di 0 , 
( 2 ) the postorder of node h is higher than the postorder of nodes o and p, and the postorder 
of node o is smaller than the postorder of node p (both in the sig(Q) and subsigs{T)). 
If we change in our query tree Q the label h for /, we get sig(Q) = (/, 3; o, 1 ;p, 2). 
Such modified query tree is also included in D, because Lemma 1 does not insist on 
the strict parent-child relationships, and implicitly considers all such relationships as 
ancestor-descendant. However, the query tree with the root g, resulting in sig(Q) = 
(g, 3; o, 1; p. 2), does not qualify, even though it is also sequence-included (on the level 
of names) as the sub-signature subsigs(D) = {g,4\o,6\p,7)\S = {6,9,10}. The 
reason is that the query requires the postorder to go down from g to o (from 3 to 1), 
while in the sub-signature it actually goes up (from 4 to 6 ). That means that o is not a 
descendant node of g, as required by the query, which can be verified in Fig. 1 . 

Multiple nodes with common names may result in multiple tree inclusions. As 
demonstrated in [4], the tree signatures can easily deal with such situations just by 
simply distinguishing between node names and their unique occurrences. 



3 Unordered Tree Pattern Matching 



In this section, we propose an approach to the unordered tree pattern matching using 
the tree signatures. The following definition specifies the notion of the unordered tree 
inclusion. 




Tree Signatures and Unordered XML Pattern Matching 



127 



Definition 2 (Unordered Tree Inclusion). Given a query twig pattern Q and an XML 
tree D, an unordered tree inclusion of Q in D is identified by a total mapping from nodes 
in Q to some nodes in D, such that only the ancestor-descendant structural relationships 
between nodes in Q are satisfied by the corresponding nodes in D. 

The unordered tree inclusion evaluation essentially searches for a node mapping keep- 
ing the ancestor-descendant relationships of the query nodes in the target data nodes. 
Potentially, tree signatures are suitable for such a task, because they rely on a num- 
bering scheme allowing a unique identification of nodes in the tree and also retaining 
the ancestor-descendant relationships between them. However, signatures assume (data 
and query) trees always ordered, so the serialization of trees based on the preorder and 
postorder ranks does not only capture the ancestor-descendant but also the sibling rela- 
tionships. For this reason, the unordered tree inclusion can not be evaluated by directly 
checking inclusion properties of the query in the data tree signature. More formally, 
using the concept of tree signatures, the unordered query tree Q is included in the data 
tree D if at least one qualifying index set exists. 

Lemma 2. Suppose the data tree D and the query tree Q to be specified by signatures 
sig(D) = (d ll post(di); d 2 ,post(d 2 ); ■ . . d m ,post(d m )) 



sig{Q ) = (qi,post(qi);q 2 ,post(q 2 ); . . .q n ,post(q n )) 

The unordered query tree Q is included in the data tree D if the following two conditions 
are satisfied: (1) on the level of node names, an ordered set of indexes S = {si, s 2 , :i . . .s„ } 
exists, 1 < Sj < nfor i = 1, . . . , n, such that d Si = qi, (2) for all pairs of entries i and 
j, i,j= 1,2, .. . |Q| — 1 andi + j < \Q\, ifpost(qi+j) < post{qf) then post(d Si+j ) < 
post(d Si ) A s i+j > Si . 

Observe that the order of entries in the index set S is determined by the name equality 
of condition (1). But unlike for the ordered inclusion, given by Lemma 1, values of 
indexes s* are not necessarily increasing with growing i. Since the query signature is a 
sequence of nodes in increasing preorder, ancestor-descendant relationships in condition 
(2) are simply recognized by a test of postorders for all pairs of entries post(qi+j) < 
post(qi), and whenever the entry with higher preorder, i.e. i + j, has a smaller postorder, 
the required relationship is found. However, to check the same ancestor-descendant 
relationships in the data sub-signature, we must not only test the postorders of the 
corresponding pair of entries in the sub-signature, post(d Si+j ) < post(d Si ), but also 
their preorders, s*+j > s,, because the correct location of a relationship is a two- 
dimensional problem. If required, any S satisfying the properties specified in Lemma 
2 can always undergo a sorting process in order to determine the corresponding sub- 
signature of sig(D) qualifying the unordered tree inclusion of Q in D. 

Example 1. Consider the query Q and the data tree D in Fig. 4 where the double arrow 
represents an ancestor-descendant edge. The only sub-signature qualifying the unordered 
tree inclusion of Q in D is defined by the index set {1, 5, 3} and the corresponding sub- 
signature is subsig {1) 3 £}(D) = (a, 5; b, 1; /, 4). 




128 



P. Zezula, F. Mandreoli, and R. Martoglia 



Q 



D 



S 

/ 





a 

s \ 



b c 



f 



sig{Q) 



(a, 3, 4,0;/, 1,3,1; b, 2, 4,1} 



sig{D) = (a, 5; a, 3; b, 1; c, 2; /, 4) 



Fig. 4. Sample of query evaluation 



The solution we propose basically employs tree signatures to represent data trees. 
Then we transform the query tree into multiple (partial) queries and evaluate the ordered 
inclusion of such multiple queries to obtain an answer to the unordered tree inclusion. 
Suppose the data tree D specified by signature sig(D) and the query tree Q specified 
by signature sig(Q), the unordered tree inclusion can be evaluated in the following two 
alternative ways: 

- Consider all and only such permutations Q j of the query Q satisfying its ancestor- 
descendant relationships and compute the answers to the ordered inclusion of the 
corresponding signatures sig(Qi) in the data signature sig(D). The union of the 
partial answers is the result of the unordered inclusion of Q in D. Indeed, the 
signature of any (}, maintains all the ancestor-descendant relationships of Q and 
a specific form of the sibling relationships - the ordered tree inclusion evaluation 
checks both types of relationships. 

- Decompose the query tree Q into a set of root-to-leaf paths P, ; and evaluate the 
inclusion of the corresponding signatures sig(Pi) in the data signature sig(D). 
Any path P, represents all (and only) the ancestor-descendant relationships between 
the involved nodes. Thus, an ordered inclusion of sig(Pi) in sig(D) states that 
a mapping, keeping the ancestor-descendant relationships, exists from the nodes in 
Pi to some nodes in D. If there are structurally consistent answers to the ordered 
inclusion of all the paths /' in D, the unordered tree inclusion of Q in D is found. 

In the following, we first analyze the second approach, and then we experimentally 
compare it with the first one. In principle, the decomposition approach consists of the 
following three steps: 

1. decomposition of the query Q into a set of paths P,\ 

2. evaluation of the inclusion of the corresponding signatures sig(Pi) in the data sig- 
nature sig(D); 

3. identification of the set of answers to the unordered inclusion of Q in D. 




Tree Signatures and Unordered XML Pattern Matching 



129 



Input: query Q represented by the extended signature 
sig{Q) = (tupost^Jf^fai, . . . ;t n ,post(t n ), ff n , fa n ) 

Output: the ordered set reiv(Q) of paths of sig(Q) defined by the index sets Pj 

Algorithm: 

(1) for j from 1 to n do 

(2) if (// i = (j + 1)) 

(3) i = j; 

(4) Pj = {*}; 

(5) while (/aj <> 0) 

(6) Pj = PjU{f ai }; 

(7) i = fa,i ; 

(8) push(rew(Q), Pj) ; 

(9) sort (rew(Q)); 



Fig. 5. The query decomposition algorithm 



3.1 Query Decomposition 

The query decomposition process transforms a query twig into a set of root-to-leaf paths 
so that the ordered tree inclusion can be safely applied. For efficiency reasons, we sort the 
paths on the basis of their selectivity, so that in the next phase, the more selective paths 
are evaluated before the less selective ones. Fig. 5 shows an algorithm based on the above 
assumption for the detection of all the root-to-leaf paths of a query Q represented by 
the extended signature sig(Q) = (ti,posf(fi), ff lt /a 4 ; . . . ;t n ,post(t n ), f f n , fa n ). 
Firstly, it identifies all the root-to-leaf paths and then sorts them assuming a predefined 
policy. The outcome is an ordered set rew(Q) of the sub-signatures subsigp^Q) de- 
fined by the index sets Pj, for each leaf j. It sequentially scans the signature sig(Q) 
(line 1 ) and, whenever it finds a leaf (line 2 ) j, it follows the path connecting j with 
the tree root (lines 3-7). The nodes found in the path constitute the set of indexes Pj. 
Finally, in line 9, it sorts the sets of indexes on the basis of their selectivity. As statistics 
about the selectivity of paths are lacking, we suppose that the longer the path, the more 
selective it is. Recall that the length of any path corresponds to the level of the path’s 
leaf node. In this case, as shown in Sec. 2, the level of any leaf j can be easily computed 
from the extended signature sig(Q) and paths can be sorted according to the leaf node 
level in descending order. The outcome is an ordered set of root-to-leaf paths covering 
the overall query Q and arranged according to the selectivity. 

Example 2. Let us consider the query tree in Fig. 6 . The algorithm sequentially scans 
the signature sig(Q) up to j = 4 since // 4 = 5 (no descendant nodes), so the 4-th 
node is a leaf defining P 4 = {1, 2, 3, 4}. Then the algorithm iterates by starting from 
j = 5. Assuming that the paths are ordered on the basis of their lengths, the final 
outcome is rew(Q) = {P 4 , P5, P7, Pg} such that P 5 = {1, 2, 5}, P7 = {1, 6 , 7}, and 
P 8 = {1, 6 , 8}. Notice that, as rew(Q) is an ordered set, paths will be evaluated in the 
same order as they appear in rew(Q). 





130 



P. Zezula, F. Mandreoli, and R. Martoglia 



a 

s \ 
s \ 

b ft. 

s \ >/ \ 



c go p 

i 

d 



sig(Q) = (a, 8, 9, 0; b, 4, 6, 1; c, 2, 5, 2; d , 1, 5, 3; g, 3, 6, 2; h, 7, 9, 1; o, 5, 8, 6; p, 6, 9, 6) 



Fig. 6. Sample query tree 



3.2 Path Inclusion Evaluation 

After the query has been decomposed into a sequence of paths, it has to be evaluated 
against the data signature. The answers to the unordered tree inclusion of Q in D 
are computed by joining the solutions to the individual paths of rew(Q). As far as 
the evaluation of each individual path P £ rew(Q) with respect to a data tree D is 
concerned, it can be performed in an ordered fashion - for path queries, the ordered 
evaluation coincides with the unordered one. As each P £ rew(Q) identifies a path 
of Q, we know that each node is the descendant of the nodes appearing before the 
node in P. Following the numbering scheme of the sub-signature subsigs-p(Q) = 
(t Sl ,post(t Sl ),ff Si ,fa Si ; ... ,t Sh ,post(t Sh ),ff Sh ,fa Sh ) defined by S = {si = p x , 
. . . ,Sh = Pi }, the postorder values of subsequent entries i and i+j (i,j = 1,2, . . . ft— 1 
and i + j < h ) always satisfy the inequality post(q Si ) < post(q Si+j ). The lemma be- 
low easily follows from the above observation and from the fact that inequalities are 
transitive. 

Lemma 3. A path P £ rew(Q) is included in the data tree D, in the sense of Defi- 
nition 2, if the following two conditions are satisfied: (1) on the level of node names, 
subsigp(Q) is sequence-included in sig(D) determining subsigs(D) through the 
ordered set of indexes S = {si, . . . , Sh}, (2) for each i £ [1, n — 1]: post(d Si ) < 
post(d Si+1 ). 

For each path query P £ reufQ), we are thus able to compute the answer set ansp(D) 
= {5 | subsigs(D) qualifies the inclusion of P in D}. Such evaluation is simpler than 
the ordered tree inclusion evaluation of Lemma 1, because the path relationships are 
strictly of the ancestor-descendant type. Since the relationship, expressed by the in- 
equality <, is transitive, we can simply check inequalities between postorder values of 
adjacent entries, and limit the verification to ft. — 1 checks, provided the length of the path 
P is ft. Notice that, as in the ordered tree inclusion evaluation case, all hierarchical re- 
lationships in the query tree are implicitly considered as the ancestor-descendant, rather 
than the parent-child relationships. In case the parent-child relationships are strictly 
required, an additional simple control through the first ancestor pointer is necessary. 




Tree Signatures and Unordered XML Pattern Matching 



131 



3.3 Identification of the Answer Set 

The answer set ctosq(D) of the unordered inclusion of Q i n D can be determined by 
joining compatible answer sets ansp(D), for all P £ rew(Q). The main problem is 
to establish how to join the answers for the paths in rew(Q) to get the answers of the 
unordered inclusion of Q in D. Not all pairs of answers of two distinct sets are necessarily 
“joinable”. The condition is that any pair of paths I) and P 3 share a common sub-path 
(at least the root) and differ in the other nodes (at least the leaves). Such commonalities 
and differences must meet a correspondence in any pair of index sets Si £ ansp^D) 
and Sj £ ansp •. ( D ), respectively, in order that they are joinable. In this case, we state 
that Si £ ansp i (D ) and Sj £ ansp.(D) are structurally consistent. 

Example 3 (cont. Ex. 1). Consider again the query Q and the data tree D in Fig. 4. 
Notice that the index set {1,5,3} satisfies both conditions of Lemma 2 whereas the 
index set {2, 5, 3} only matches at the level of node names but it is not a qualifying one. 
The rewriting of Q gives rise to the following paths rew(Q) = {P 2 , P 3 }, where P 2 = 
{1, 2} and P 3 = {1, 3}, and the outcome of their evaluation is ansp 2 = {{1, 5}} and 
ansp 3 = {{1, 3}, {2, 3}}. The common sub-path between P 2 and P 3 is P 2 fl P 3 = {1}. 
The index 1 occurs in the first position both in P 2 and P 3 . From the cartesian product 
of ansp 2 (D) and ansp 3 (D) it follows that the index sets {1,5} £ ansp 2 (D) and 
{1, 3} £ ansp 3 (D) are structurally consistent as they share the same value in the first 
position and have different values in the second position, whereas {1,5} £ ansp 2 (D) 
and {2, 3} £ ansp 3 (D) are not structurally compatible and thus are not joinable. 

The following definition states the meaning of structural consistency for two generic 
subtrees 7) and Tj of Q - paths P, and Pj are particular instances of 7) and 7} . 

Definition 3 (Structural consistency). Let Q be a query twig, D a data tree, 

Ti = {tj, . . . ,7”} and Tj = {tj, . . . ,tj*} two ordered sets of indexes determining 
subsigTi ( Q ) and subsigj •. ( Q ), respectively, ansp ( D ) and anspj (D) the answers of 
the unordered inclusion ofTi and Tj in D, respectively. 

Si = {4, . . . , sj} £ anSTi (D) and Sj = {sj, . . . , sj 1 } £ anspj (D) are structurally 
consistent if: 

- for each pair of common indexes tff = tj, sb = s jy 

- for each pair of different indexes t.b 4 tj, sj 4 sj. 

Definition 4 (Join of answers). Given two structurally consistent answers 
Si £ ansTi(P) and Sj £ ansp^D), where Ti = {tj, . . . , t”}, Tj = {tj, . . . , tj 1 }, 
Si = {sj, . . . , sj} and Sj = {sj, . . . , sj 1 }, the join of Si and Sj, Si cxi Sj, is defined 
on the ordered set Ti U Tj = {t 1 , . . . , t k } as the index set {s 1 , . . . , s k } where: 

- for each h = 1 , . . . ,n, l £ { 1 , ... , k} exists such that tj = t l and sj = s l ; 

- for each h = 1 , . . . , m, l £ { 1 , . . . , k} exists such that tj = t l and sj = s l . 

Any answer to the unordered inclusion of Q in D is the result of a sequence of joins 
of structurally consistent answers, one for each P £ rew(Q), identifying distinct paths 
in sig(D). The answer set ansQ(D) can thus be computed by sequentially joining the 
sets of answers of the evaluation of the path queries. We denote such operation as the 
structural join. 




132 



P. Zezula, F. Mandreoli, and R. Martoglia 



ansp 2 ( D ): 



1 


2 


T 


5" 



ansp 3 (D ): 



Pa 



1 3 

T3 

2 3 



sj(ansp 2 , ansp 3 ): 



1 


2 


3 


T 


¥ 


3" 



Fig. 7. Structural join of Ex. 1 



Definition 5 (Structural join). Let Q be a query txvig, D a data tree, Tj and Tj 
two ordered sets of indexes determining sub.sigTfiQ) and sub-sigp. (Q), respectively, 
ansTi(D) and ansp (D) the answers of the unordered inclusions ofTi and Tj in D, 
respectively. 

The structural join sj(ansT i (D),ansT j (D)) between the two sets ansp^D) and 
ansTj (D) is the set ansp(D) where: 

— T = {t 1 , . . . ,t k } is the ordered set obtained by the union Tj U Tj of the ordered 
sets Tj and Tj ; 

— ansp(D) contains the join Sj CXI Sj of each pair of structurally consistent answers 
(Si £ ansTi(D), Sj £ ansp j (D)). 

The structural join sj(ansp i (D), ansp j (-D)) thus returns an answer set defined on 
the union of two sub-queries 1) and Tj as the join of the structurally consistent an- 
swers of ansTi(D) and ansp^D). Starting from the set of answers {ansp xi (D), 
. . . ,ansp Xk (D)} for paths in rew(Q), we get the answer set ansQ(D) identifying 
the unordered inclusion of Q in I) by incrementally merging the answer sets by means 
of the structural join. Since the structural join operator is associative and symmetric, we 
can compute ansQ(D) as: 

ans Q (D) = sj(ans Pxi ( D ),... , ans Pxk (£>)) (1) 

where rew(Q) = {P Xl ,. . . , P Xk }. 

Example 4. The answer set ansQ (D) of Example 1 is the outcome of the structural 
join sj(ansp 2 (D),ansp 3 (D)) = ansp 2U p 3 (D) where P 2 U P 3 = {1,2} U {1,3} is 
the ordered set {1,2,3}. The answers to the individual paths and the final answers are 
shown in Fig. 7 (the first line of each table represents the query). It joins the only pair 
of structurally consistent answers: {1,5} £ ansp 2 (D ) and {1,3} £ ansp 3 (D). 

Example 5. Given the query Q in Fig. 8 , in this example we show the evaluation of 
the unordered tree inclusion of Q in the data tree D from Fig. 4. It can be easily ver- 
ified that there is no qualifying sub-signature since at most two of the three paths find 
a correspondence in the data tree. 

The rewriting phase produces the set rew(Q) = {P 2 , P 3 , P 4 } where P 2 = {1, 2}, 
P 3 = {1, 3}, and P 4 = {1, 4}. The final result ansQ(D) is the outcome of the structural 
join: 



sj(ansp 2 (D),ansp 3 (D),ansp 4 (D)) = sj(sj(ansp 2 (D),ansp 3 (D)),ansp i (D )) = 0 

The answer sets of the separate paths and of sj(ansp 2 (D),ansp 3 (D)) are shown in 
Fig. 9. The final result is empty since the only pair of joinable answers {1,5,3} £ 





Tree Signatures and Unordered XML Pattern Matching 



133 



a 

S i \ 

f i f 

b 



sig(Q) = (a, 4, 5, 0; /, 1, 3, 1; b, 2, 4, 1; /, 3, 5, 1} 



Fig. 8. Sample of query 









P 3 


T 




/ n \ P 2 

ansp 2 (V): 


1 

T 


2 

5 " 


ansp 3 ( D ): 


T 


3 










2 


3 



ans Pi (D): 



sj(ansp 2 (D),ansp 3 ( D))\ 



1 


2 


3 


T 


X 


3 



Fig. 9. Structural join from Example 5 



sj(ansp 2 (D),ansp 3 (D)) and {1,5} £ ansp 4 (D) is not structurally consistent: the 
two different query nodes 2 £ P 2 U P 3 and 4 £ P 4 correspond to the same data node 5. 
It means that there are not as many data tree paths as query tree paths. 



Theorem 1. Given a query twig Q and a data tree D, the answer set ansQ(D) as defined 
by Eq. 1 contains all and only the index sets S qualifying the unordered inclusion of Q 
in D according to Lemma 2. 



4 Efficient Computation of the Answer Set 

Till now, we have studied how tree signatures can be employed to support unordered tree 
pattern matching. However, XML data trees can have many nodes and the tree signatures, 
linearly proportional to the number of nodes, can be very large, so the performance 
aspects of such operation becomes a matter of concern. In the previous section, we have 
specified two distinct phases for unordered tree pattern matching: the computation of 
the answer set for each root-to-leaf path of the query and the structural join of such sets. 
The main drawback of this approach is that many intermediate results may not be part 
of any final answer. In the following, we show how these two phases can be merged into 
one to avoid unnecessary computations. The basic idea is to evaluate at each step the 
most selective path among the available ones and to directly combine the partial results 
computed with structurally consistent answers of the paths. 

The full algorithm is depicted in Fig. 10. It makes use of the pop operation which 
extracts the next element from the ordered set of paths rew(Q). The algorithm essentially 
computes the answer set by incrementally joining the partial answers collected up to that 
moment with the answer set of the next path P in rew(Q). As paths are sorted by their 
selectivity, P is the most selective path among those which have not been evaluated 





134 



P. Zezula, F. Mandreoli, and R. Martoglia 



Input: the paths of the rewriting phase rew(Q) 

Output: ansQ(D) 

Algorithm: 

(1) P = pop (rew(Q)) ; 

(2) pQ = P; 

(3) evaluate ans p Q(D); 

(4) while ( (rew(Q) not empty) AND ( ans p Q(D ) not empty)) 

(5) P = pop (rew(Q)) ; 

(6) pP = P\(P C\pQ); 

(7) t k is the parent of pP , k is the position in pQ; 

(8) PAns = 0; 

(9) for each answer S in ans p Q(D ) 

(10) evaluate ans p p(subsig{ Sk+lt ... ,//„ -i}(-D)) ; 

(11) if (.ans P p(subsig{ Sk+1 ,...jf 3k - 1 }(D)) not empty) 

(12) add sj ({S'}, ans p p(subsig{ ak +i,..., ffsk -i}(D))) to PAns; 

(13) pQ = pQ U P; 

(14) ans p Q(D)=PAns;} 



Fig. 10. The unordered tree pattern evaluation algorithm 



yet. In particular, from step 1 to step 3, the algorithm initializes the partial query pQ 
evaluated up to moment to the most selective path P and stores in the partial answer set 
ans pQ (D) the evaluation of the inclusion of pQ in D. From step 4 to step 12, it iterates 
the process by joining the partial answer set ans p Q(D) with the answer set ansp(D) 
of the next path P of rew(Q). Notice that, at each step, it does not properly compute 
first the answer set ansp(D) and the structural join sj(ans p Q(D),ansp(D)) as shown 
in Eq. 1, but it rather applies a sort of nested loop algorithm in order to perform the 
two phases in one shot. As each pair of index sets must be structurally consistent in 
order to be joinable, we compute only such answers in ansp(Q ), which are structurally 
consistent with some of the answers in ans p Q(D). As a matter of fact, only such answers 
may be part of the answers to Q. In order to do it, the algorithm tries to extend each 
answer in ans p Q ( D ) to the answers to pQ U P by only evaluating such sub-path of P 
which has not been evaluated in pQ. In particular, step 6 stores in the sub-path pP such 
part of the path P to be evaluated which is not in common with the query pQ evaluated 
up to that moment: P \ (P fl pQ). Step 7 identifies t k as the parent of the partial path 
pP where k is its position in pQ. For instance, by considering Example 3, the two paths 
P 2 and P 3 of the query Q are depicted in Fig. 4a. If rew(Q ) = {P 2 , P 3 }, then at step 5 
pQ = P 2 and, as the part of the path P 3 corresponding to the query node a has already 
been evaluated while evaluating P 2 , the partial path pP to be evaluated and the parent 
t k of pP are depicted in Fig. 4b. 

For each index set S £ ans p Q(D), each index set in ansp(Q), which is structurally 
consistent with S, must share the same values in the positions corresponding to the 
common sub-path P fl pQ. In other words, we assume that the part of the path P which 
is common to pQ has already been evaluated and that the indexes of the data nodes 





Tree Signatures and Unordered XML Pattern Matching 



135 




(a) 



(b) 



Fig. 11. Evaluation of paths in Algorithm of Fig. 10: an example 



matching P fl pQ are contained in S. In particular, the index s k in S actually represents 
the entry of the data node matching the query node corresponding to t k . Thus, in order 
to compute the answers in ansp(D) that are structurally consistent with S and, then, 
join with S, the algorithm extends S to the answers to PUpQ by only evaluating in the 
“right” sub-tree of the data tree the inclusion of the part pP of the path P which has not 
been evaluated yet (step 10). As the path P has been split into two branches P DpQ and 
pP, where t k is the parent of pP and S contains a set of indexes matching P fl pQ , the 
evaluation of pP must be limited to the descendants of the data node d s k which in the 
tree signature corresponds to the sequence of nodes having preorder values from Sk + 1 
up to ff Sk — 1. Then it joins S with such answer set by only checking that different 
query entries correspond to different data entries (step 12). Notice that, in step 10, by 
shrinking the index interval to a limited portion of the data signature, we are able to 
reduce the computing time for the sequence inclusion evaluation. 

The algorithm ends when we have evaluated all the paths in rew(Q) or when the 
partial answer set collected up to that moment ans p Q(D) becomes empty. The latter 
case occurs when we evaluate a path P having no answer which is structurally consistent 
with those in ans p Q(D ): sj(ans p Q(D) , ansp(D)) = 0. In this case, for each answer 
S in ans p Q(D) two alternatives exists. Either the evaluation of the partial pathpP fails 
(line 11), which means that none of the answers in ansp(D) share the same values of 
S in the positions corresponding to the common sub-path P fl pQ, or the structural join 
between S and the answers to pP fails (line 12), which means that some of the answers 
in ansp(D) share the same values of S in positions corresponding to different indexes 
in P and pQ. 

Example 6. Let us apply the algorithm described in Fig. 10 to Example 3 where the sig- 
natures involved are sig(Q) = (a, 3, 4, 0; /, 1, 3, 1; b, 2, 4, 1) and sig(D) = (a, 5; a, 3; 
b, 1; c, 2; /, 4). Since the two paths are of the same length, we start from P 2 = {1, 2} 
whose answer set is ansp 2 (D) = {{1, 5}}. Then, the algorithm essentially deals with 
the next path, e.g. P 3 = {1, 3}, in the way shown in Fig. 4. It computes P 2 fl P 3 = {1}, 
t k = 1 where k = 1, and pP = {3}. It then considers the only index set 5 = {1,5} in 
ansp 2 ( D ) and stores in ans p p(D) the index sets qualifying the inclusion of the query 
subsig{ 3 } ( Q ) = (b, 2, 4, 1 ) on the sub-tree rooted by the data node labelled with a and 
having index s 1 = 1 that is in the signature sm&-S* 5 { 2 , 3 , 4 , 5 }(-C ) ) = (a, 3; 6 , 1; c, 2; /, 4). 




136 



P. Zezula, F. Mandreoli, and R. Martoglia 



The outcome is thus ans p p(D) = {{3}} and ansQ(D) = {{1, 5, 3}}. Being P 2 and 
P 3 of the same length, we can also start from ansp 3 (D) = {{1, 3}, {2, 3}}. In this 
case pP = {2} while, as in the previous case, t k = 1 where k = 1. We then con- 
sider the first index set {1,3} and evaluate ans p p(D) on the descendants of the data 
node having index s 1 = 1. The answer ans p p{sub-sig^2,3M^) to t ^ le inclusion of 
subsig^p} (Q) = (/, 1, 3, 1; ) in sub-sig {2 , 3 , 4 , 5 } (D) = {a, 3; b, 1; c, 2; /, 4) is {{5}}. 
Thus ansQ(D) = {{1, 5, 3}}. For the next index set {2, 3}, it is required to evaluate 
sub.sig^ppy(Q) on the sub-tree rooted by s 1 = 2 that is sub-Sig^ 3 ^y(D) = ( b , 1; c, 2) 
and the answer set ans p p(D) is empty. 

In summary, the proposed solution performs a small number of additional operations 
on the paths of the query twig Q , but dramatically reduces the number of operations on 
the data trees by avoiding the computation of useless path answers. In this way, we 
remarkably reduce computing efforts. Indeed, while query twigs are usually very small 
and have a limited number of paths, XML data trees can have many nodes and tree 
signatures can be very large. 



Table 1. DBLP Test-Collection Statistics 



Middle-level 


Leaf-level j 


Element name 


Occs 


Element name 


Occs 


inproceedings 


241244 


author 


823369 


article 


129468 


title 


376737 


proceedings 


3820 


year 


376709 


incollection 


1079 


url 


375192 


book 


1010 


pages 


361989 


phdthesis 


72 


booktitle 


245795 


mastersthesis 


5 


ee 


143068 






crossref 


141681 






editor 


8032 






publisher 


5093 






isbn 


4450 






school 


77 


Summary j 


Total number of elements 


3814975 


Total number of values 


3438237 


Maximum tree height 


3 



5 Performance Evaluation 

In this section we evaluate the performance of our unordered tree inclusion technique. We 
measure the time needed to process different query twigs using the paths decomposition 
approach, deeply described in this paper, and compare the obtained results with the query 
processing performance of the permutation approach. 





Tree Signatures and Unordered XML Pattern Matching 



137 



All algorithms are implemented in Java JDK 1.4.2 and the experiments are executed 
on a Pentium 4 2.5Ghz Windows XP Professional workstation, equipped with 512MB 
RAM and a RAIDO cluster of 2 80GB EIDE disks with NT file system (NTFS). 

Since synthetic data sets are not significant enough to show the performance of real- 
life XML query scenarios, we performed our experiments on a real data set, specifically 
the complete DBLP Computer Science Bibliography archive as of April 2003. Table 
1 shows more details about this XML archive. Notice that the file consists of over 
3.8 Millions of elements, where over 3.4 Millions of them have associated values. The 
size of the XML file is 156MB. 



Twig xL2-2 

inproceedings 



Twig xH2-2 




author title 



Twig xL3-2 

a inproceedings 

A\ 

author title year 



Twig xH 3 -2 




author title year 



Twig xLd-2 




Fig. 12. The query twigs templates used in the performance tests 



We tested the performance of our approach for queries derived from six query twig 
templates (see Fig. 12). Such templates present different element name selectivity , i.e. 
the number of elements having a given element name, different branching factors, i.e. 
the maximum number of sibling elements, and different tree heights. We refer to the 
templates as “xSb-h’, where S stands for element name selectivity and can be H(igh) or 
L(ow), b is the branching factor, and h the tree height. To understand the element name 
selectivity, refer to Table 1, showing the number of occurrences of each name in the 
DBLP data set. In particular, we used inproceedings for the low selectivity and book 
and phdthesis for the high selectivity. 

We have conducted experiments by using not only queries defined by the plain 
templates (designated as “ NSb-h ”) which only contain tree structural relationships, but 
also queries (designated as “ VSb-h ”), where the templates are extended by predicates 
on the author name. Value accesses are supported by a content index. We have chosen 
the highly content-selective predicates, because we believe that this kind of queries is 
especially significant for highly selective fields, such as the author name. On the other 
hand, the performance of queries with low selectivity fields should be very close to the 
corresponding templates. In this way, we measure the response time of twelve queries, 
half of which contain predicates. 



138 



P. Zezula, F. Mandreoli, and R. Martoglia 



Table 2. Performance comparison of the two unordered tree inclusion alternatives 



Query 


Evaluation j 


Twig 


elements 


predicates 


solutions 


Decomposition 




Permutation 




# 


# 


# 


(sec) 


N 


mean (sec) total (sec) 


NH2-2 


3 


0 


1343 


0.016 


2 


0.014 


0.028 


NH3-2 


4 


0 


1343 


0.016 


6 


0.015 


0.105 


NH7-3 


10 


0 


90720 


1.1 


288 


0.9 


259.2 


NL2-2 


3 


0 


559209 


2.2 


2 


2.28 


4.56 


NL3-2 


4 


0 


559209 


4.2 


6 


2.49 


14.94 


NL8-2 


9 


0 


149700 


7.7 


40320 


4.8 


193536 


VH2-2 


3 


1 


1 


0.015 


2 


0.014 


0.028 


VH3-2 


4 


1 


1 


0.016 


6 


0.016 


0.096 


VH7-3 


10 


2 


1 


0.031 


288 


0.03 


8.64 


VL2-2 


3 


1 


39 


0.65 


2 


0.832 


1.664 


VL3-2 


4 


1 


36 


0.69 


6 


1.1 


6.6 


VL8-2 


9 


1 


29 


0.718 


40320 


2.3 


92736 



Table 2 summarizes the results of the unordered tree inclusion performance tests for 
both approaches we considered. For each query twig, the total number of elements and 
predicates, the number of solutions (inclusions) found in the data set, and the processing 
time, expressed in seconds, are reported. For the permutation approach, the number of 
needed permutations and the mean per-permutation processing time are also presented. 
It is evident that the decomposition approach is superior and scores a lower time in every 
respect. In particular, with low branching factors (i.e. 2), such approach is twice as faster 
for both selectivity settings. With high branching factors (i.e. 3, 8) the speed increment 
becomes larger and larger - the number of permutations required in the alternative 
approach grows factorially: for queries NL8-2 and VL8-2 the decomposition method is 
more than 25,000 times faster. The decomposition approach is particularly fast with the 
high selectivity queries. Even for greater heights (i.e. in VH7-3), the processing time 
remains in milliseconds. 

For the decomposition method, as we do not have statistics on the path selectivity at 
our disposal, we measured the time needed to solve each query for each of the possible 
order of path evaluation and reported only the lower one. As we expected, we found 
that starting with the most highly selective paths always increases the query evaluation 
efficiency. In particular, the time spent is nearly proportional to the number of occurrences 
of such path in the data. Evaluating query NL2-2 starting with the title path produces 
a response time of 2.2 seconds, while starting with the less selective author path, the 
time would nearly double (3.9 sec.). This holds for all the query twigs as well - for 
NL8-2, the time ranges from 7.7 sec (crossref path) up to 15.7 sec (author path). Of 
course, for the predicate queries the best time is obtained by starting the evaluation from 
the value-enabled paths. 

Finally, notice that the permutation approach also requires an initial “startup” phase 
where all the different permutation twigs are generated; the time used to generate such 
permutations is not taken into account. 





Tree Signatures and Unordered XML Pattern Matching 



139 



6 Conclusions 

In this paper, we have studied the problem of efficient evaluation of unordered query trees 
in XML tree structured data collections. As the underlying concept, we have used the 
tree signatures, which have proved to be useful structure for an efficient tree navigation 
and ordered tree matching, see [4], We have identified two evaluation strategies, where 
the first strategy is based on multiple evaluation of all query tree structure permutations 
and the second on decomposing a query tree into a collection of all root-to leaf paths. 

We have deeply studied the decomposition approach and established rules for de- 
composition as well as strategies for integration of partial, structurally consistent, results 
through structural joins. Based on the developed theoretical grounds, an efficient imple- 
mentation algorithm is proposed. 

The permutation and decomposition approaches to the unordered tree matching have 
been tested on the DBLP data set for various types of queries. The experiments demon- 
strate a clear superiority of the decomposition approach, which is especially advantages 
for the large query trees, and for trees with highly selective predicates. 

Experiments also confirmed the expected fact that the order in which the paths are 
evaluated in the decomposition approach can have significant effects on the overall 
performance. Though the proposed strategy of starting with the longest path seems to 
work quite well, we plan to work on this aspect more deeply in the near future. 



References 

1. Dietz, P.F.: Maintaining Order in a Linked List. In: Proceedings of STOC, 14th Annual ACM 
Symposium on Theory of Computing, May 1982, San Francisco, CA, 1982 122-127 

2. Crust. T.: Accelerating XPath Location Steps. In: Proceedings of the 2002 ACM SIGMOD 
International Conference on Management of Data, Madison, Wisconsin (2002) 109-120 

3. Shasha, D., Wang, J.T.L., Shan. H., Zhang, K.: ATreeGrep: Approximate Searching in Un- 
ordered Trees. Proceedings of the 14th International Conference on Scientific and Statistical 
Database Management, July 24-26, 2002, Edinburgh Scotland UK 89-98 

4. Zezula, P, Amato, G., Debole, F., Rabitti, F.: Tree Signatures for XML Querying and Navi- 
gation. In: Proceedings of the XML Database Symposium, XSym 2003, Berlin, September 
2003, LNCS 2824, Springer 149-163 

5. Zhang, K., Statman, R., Shasha, D.: On the Edit Distance between Unordered Labeled Trees. 
Information Processing Letters 42 (1992) 133-139 




Quantum Query Complexity for Some Graph 

Problems* 



Aija Berzina, Andrej Dubrovsky, Rusins Freivalds, Lelde Lace, and 
Oksana Scegulnaja 

Institute of Mathematics and Computer Science, University of Latvia, Raina bulv. 

29, Riga, Latvia 

aija.berzinaOtietoenator . com, a. dubrovskis@alise . lv, 

{Rusins . Freivalds , Lelde . Lace}@mii . lu. lv, oksana. s@liis . lv 



Abstract. The paper [4] by H. Buhrman and R. de Wolf contains 
an impressive survey of solved and open problems in quantum query 
complexity, including many graph problems. We use recent results by 
A.Ambainis [1] to prove higher lower bounds for some of these problems. 
Some of our new lower bounds do not close the gap between the best 
upper and lower bounds. We prove in these cases that it is impossible to 
provide a better application of Ambainis’ technique for these problems. 



1 Introduction 

Recently it has become clear that a quantum computer could, in principle, solve 
certain problems faster than a conventional computer. A quantum computer is 
a device, which takes full advantage of quantum mechanical superposition and 
interference. Building an actual quantum computer is probably far off in the 
future. 

Boolean decision trees model is the most simple model to compute Boolean 
functions. In this model the primitive operation made by an algorithm is evalu- 
ating an input Boolean variable. The cost of a (deterministic) algorithm is the 
number of variables it evaluates on a worst case input. It is easy to find the 
deterministic complexity of all explicit Boolean functions (for most functions it 
is equal to the number of variables). 

The black-box model of computation arises when one is given a black-box 
containing an iV-tuple of Boolean variables X = (a,’o, x±, ..., Xn-i-)- The box 
is equipped to output Xi on input i. We wish to determine some property of 
X , accessing the Xi only through the black box. Such a black-box access is called 
a query. A property of X is any Boolean function that depends on X, i.e. a 
property is function / : {0, 1}^ —> {0, 1}. We want to compute such properties 
using as few queries as possible. 

Consider, for example, the case where the goal is to determine whether or 
not X contains at least one 1, so we want to compute the property OR(X) = 

* Research supported by Grant No. 01. 0354 from the Latvian Council of Science 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 140-150, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Quantum Query Complexity for Some Graph Problems 141 



Xq V . . . V Xn-i- It is well known that the number of queries required to compute 
OR by any classical (deterministic or probabilistic) algorithm is O(N). 

Grover [6] discovered a remarkable quantum, algorithm that, making queries 
in superposition, can be used to compute OR with small error probability using 
only 0(y/N) queries. 

On the other hand, quantum algorithms are in a sense more restricted. For 
instance, only unitary transformations are allowed for state transitions. Hence 
rather often a problem arises whether or not the needed quantum automaton 
exists. In such a situation lower bounds of complexity are considered. It is proved 
in [3] that Grover database search algorithm is the best possible. It is proved 
in [3] that no quantum query algorithm exists for PARITY with f2(N) queries. 

We use a result by A.Ambainis [1] to prove lower complexity bounds for 
quantum query algorithms. Currently, this is the most powerful method to prove 
lower bounds of complexity for quantum query algorithms. In some cases there 
still remains a gap between the upper and the lower bounds of the complexity. In 
these cases we prove additionally that Ambainis’ method cannot provide a better 
lower bound for this problem. 

2 Definitions 

2.1 Quantum Computing 

We introduce the basic model of quantum computing. For more details, see 
textbooks by Gruska [7] and Nielsen and Chuang [8]. 

Quantum states: We consider finite dimensional quantum systems. An 
n-dimensional pure state is a vector | ip) £ C n of norm 1. Let |1), |2), . . . , |n — 1) 
be an orthonormal basis for C n . Then, any state can be expressed as | ip) = 
E” =0 a *l*) f° r some ao £ C, ai £ C , .. . , a„_ i £ C. Since the norm of \il>) is 1, 
|a.j| 2 = 1. We call the states 1 1), |2), . . . , \n — 1) basic states. Any state of the 
form a i|*) I s called a superposition of |1), |2), . . . , \n — 1). The coefficient 

a,; is called amplitude of |i). 

A quantum system can undergo two basic operations: an unitary evolution 
and a measurement. 

Unitary evolution: A unitary transformation U is a linear transformation 
on C k that preserves the I 2 norm (i.e. , maps vectors of unit norm to vectors of 
unit norm). If, before applying U, the system was in a state \if), then the state 
after the transformation is U\ip). 

Measurements: In this survey, we just use the simplest case of quantum 
measurement. It is the full measurement in the computation basis. Perform- 
ing this measurement on a state | if) = ai|0) + . ..Ofe|fc) gives the outcome i 
with probability |aj| 2 . The measurement changes the state of the system to |i). 
Notice that the measurement destroys the original state \ip) and repeating the 
measurement gives the same i with probability 1 (because the state after the 
first measurement is |«). 

More general classes of measurements are general von Neumann and POVM 
measurements [8]. 




142 



A. Berzina et al. 



2.2 Query Model 

In the query model, the input aq,. . . ,Xn is contained in a black box and can 
be accessed by queries to the black box. In each query, we give i to the black 
box and the black box outputs ay. The goal is to solve the problem with the 
minimum number of queries. The classical version of this model is known as 
decision trees [4]. 











0 1 ... 0 




X\ x 2 x N 







Fig. 1 . Quantum black box 

There are two ways how to define the query box in the quantum model. 
The first is the extension of the classical query (Figure 1). It has two inputs: i, 
consisting of [logN] bits and b consisting of 1 bit. If the input to the query box 
is a basic state |z}|6), the output is |*)|6® ay). If the input is a superposition 

b the output is JA b ayb|*}|&® ay). Notice that this definition applies 

both to case when ay are binary and to the case when they are k-valued. In the 
k- valued case, we just make b to consist of \log 2 k\ bits and take b ® ay to be 
bitwise XOR of b and ay. 

In the second form of quantum query (which only applies to problems with 
{0, 1}- valued ay), the black box has just one input i. If the input is a state 
JTaj|i), the output is JA(— l) Xi ai\i) . While this form is less intuitive, it is 
very convenient for the use in quantum algorithms, including Grover’s search 
algorithm [6] . A query of second type can be simulated by a query of first type [6] . 

A quantum query algorithm with T queries is just a sequence of unitary 
transformations 



U T -i ~^O^U t 

on some finite- dimensional space C k . Uo,Ui, . . . , Ut can be any unitary trans- 
formations that do not depend on the bits aq,...,aqv inside the black box. 
O are query transformations that consist of applying the query box to the first 
logN+1 bits of the state. That is, we represent basic states of C k as |i, b, z). Then, 
O maps |«, b, z ) to | i, b ® ay, z). We use O x to denote the query transformation 
corresponding to an input x = (aq, . . . Xn)- 

The computation starts with state |0). Then, we apply Uq,O x , . . . ,O x ,Ut 
and measure the final state. The result of the computation is the rightmost bit 
of the state obtained by the measurement (or several bits if we are considering 
a problem where the answer has more than 2 values). 

The quantum algorithm computes a function f{x i , . . . ,xi v) if, for every x = 
(aq, . . . , Xn) for which / is defined, the probability that the rightmost bit of 
UtO x Ut-i ■ ■ ■ O x XJq\Q) equals /( aq, . . . ,xn) is at least 1 — e < 




Quantum Query Complexity for Some Graph Problems 143 



The query complexity of / is the smallest number of queries used by a quan- 
tum algorithm that computes /. We denote it Q(f). 

Our proofs use the following results by A.Ambainis. 

Theorem 1. [1] Let A C {0, 1}", B C {0, l} n be such that f(A)=l, f(B)=0 and 

• for every x = ( x\..x n ) £ A, there are at least m values i £ {1 , . . . ,n} such 
that (a?i, . . .,Xi- 1 , 1 - Xi,x i+ i, . . . ,x n ) € B, 

• for every x = (. x\..x n ) £ B , there are at least m’ values i £ {1, . . . ,n} such 
that (xi, . . .,xi- 1 , 1 - Xi,x i+ 1 , ...,x n ) € A. 

Then,Q(f) = fi(\/mm'). 

Theorem 2. [1] Let f(x i,X 2 , ■■■,x n ) be a function of n {0, 1} - valued variables 
and X, Y be two sets of inputs such that f(x) y^ f(y) if x £ X and y GY. Let 
R C X *Y be such that 

• for every x £ X there exist at least m different y GY such that (x,y) £ R, 

• for every y £ T there exist at least m’ different x € X such that (x, y) £ R, 

• for every x € X and i £ {1, ...,n} there are at most U different y £ T such 
that (x, y) £ R and a y,;, 

• for every y € Y and i £ {1, there are at most l\ different x £ X such 

that (x, y) £ R and a y,;. 

Then, any quantum algorithm computing f uses Tna x^i’.) ) queries. 



Definition 1. For any Boolean function f : {0,1}^ —> {0,1} and any x = 
(aq...a: n ), ND(f,x) is the number of queries needed by nondeterministic algo- 
rithms on the values x = {x\ . . . x n ) . 

Definition 2. For any Boolean function f : {0,1}^ — > {0,1}: ND 0 (f) = 
m ax/ (x)=0 ND(f, x) and NDfff) = max /(l)=1 ND(f, x). ND 0 (f) = 

Theorem 3. [2] Whatever the sets A and B, Theorem 1 cannot prove a better 
lower bound for the query complexity Q(f) than \J ND 0 (f ) * NDfff). 

We consider the following graph problems in our paper. 

Problems 

Problem 1. Partition into cliques 

INSTANCE: Graph G=(V,E), with |P| = qk for fixed integer q > 1 and some 
integer k. 

QUESTION: Can the vertices of G be partitioned into k disjoints sets Vi, Vi , . . ., 
14 such that, for 1 < i < k, the subgraph induced by V} is a complete graph and 

m = 

Problem 2. Partition into triangles 

INSTANCE: Graph G=(V,E), with \V\ = 3/c for some integer k. 

QUESTION: Can the vertices of G be partitioned into q disjoint sets Vi, V 2 , . . ., 




144 



A. Berzina et al. 



Vk, each containing exactly 3 vertices, such that for each Vj = {ui,Vi,wi}, 

1 < i < k, all three of the edges {it,, it*}, {it,, to,}, {v,, it),} belong to E? 

Problem 3. Matching 

INSTANCE: Graph G=(V,E), \V\ = n. 

QUESTION: Can the vertices of G be partitioned into n/2 disjoints pairs 
Pi, P 2 , . . . , P n /2 such that for each P, = {it,, !>,:}, 1 < i < n/2, edge {ui,Vi} 
belong to E? 

Problem 4. Parity 

INSTANCE: Matrix M 2n x 2 n, M tJ G {0, 1}. 

QUESTION: £ . = i, 2 „ PARITY (M,) = n ? 

Problem 5. Hamiltonian circuit 

INSTANCE: Graph G=(V,E). 

QUESTION: Does G contain Hamiltonian circuit? 

Problem 6. Directed Hamiltonian circuit 

INSTANCE: Directed graph G=(V,A). 

QUESTION: Does G contain directed Hamiltonian circuit? 

Problem 7. Hamiltonian path 

INSTANCE: Graph G=(V,E). 

QUESTION: Does G contain Hamiltonian path? 

Problem 8. Travelling salesman 

INSTANCE: Set C of m cities, distance d(ci,Cj) G Z + for each pair of cities, 
CiCj G C, positive integer B. 

QUESTION: Is there a tour of C having length B or less, i.e. a permutation 
(^( 1 ), ^( 2 ), . . . , c 7r(m) ) of C such that 

(^^2=1 ^(^7r(i) > ^7r(i+l))) H - ’ ^7r(l) ) — B ? 

Problem 9. Dominating set for trees 

INSTANCE: Tree G=(V, E), positive integer K < |U|. 

QUESTION: Is there a dominating set of size K or less for G,i.e. a subset V' CV 
with \V'\ < K such that for all u G V — V' there is a v G V' for which {it, u} G El 

Problem 10. Dominating set 

INSTANCE: Graph G=(V, E), positive integer K < |Vj. 

QUESTION: Is there a dominating set of size K or less for G,i.e. a subset V' CV 
with \V'\ < K such that for all u G V—V' there is a v G V' for which {it, u} G El 

We use incidence matrix representation of graphs. 



3 Main Results 

Lemma 1. If there are k+1 not connected points in the graph G=(V,E), |Vj = 
kq, then Partition into cliques problem is not solvable. 




Quantum Query Complexity for Some Graph Problems 145 



Proof. If there is a solution for Partition into cliques, we get k disjoints sets. 
Since there is k+1 mutually not connected vertices, there is at least one subset 
containing two mutually not connected vertices and Partition into cliques 
problem is not solvable. 

Lemma 2. If graph G=(V,E), \V\ = kq, satisfies the following requirements: 

• there are k/2 mutually not connected (red) vertices, 

• there are k green vertices not connected with red ones, green vertices are 
grouped in pairs and each pair is connected by edge, 

• subgraph induced by all the rest vertices (black) is a complete graph and all 
black vertices are connected to all red and green vertices, 

then Partition into cliques problem is solvable. 

Proof. Vertices are grouped in subsets in accordance with the following: 

• each red vertex is put in a separate subset ( k/2 subsets), 

• each pair of green vertices is put in a separate subset (k/2 subsets), 

• black vertices are added as follows: q-1 to red and q-2 to green vertices. 

Such a distribution satisfies Partition into cliques problem. 

Lemma 3. If graph G=(V,E), \V\ = kq, satisfies the following requirements: 

• there are k/2 + 2 mutually not connected (red) vertices, 

• there are k-2 green vertices not connected with red ones, green vertices are 
grouped in pairs and each pair is connected by edge, 

• subgraph induced by all the rest vertices (black) is a complete graph and all 
black vertices are connected to all red and green vertices, 

then Partition into cliques problem is not solvable 

Proof. If we take red vertices and one from each pair of green vertices then 
we get k/2 + 2 + (k — 2)/2 = k + 1 vertices. These vertices are not mutually 
connected. The Partition into cliques problem is not solvable because the set of 
selected vertices satisfies the requirements of Lemma 1. 

Theorem 4. Partition into cliques requires I2(n 15 ) quantum queries. 

Proof. We construct the sets A and B for the usage of Theorem 1. The set A 
consists of all graphs G satisfying the requirements of Lemma 2. The value of the 
function corresponding to the Partition into cliques problem is 1. (This follows 
from Lemma 2.) The set B consists of all graphs G satisfying the requirements of 
Lemma 3. The value of the function corresponding the Partition into cliques 
problem is 0. (This follows from Lemma 3.) 

From each graph G € A, we can obtain G' G B by disconnecting any one of 
the edges, which connect the green vertices. Hence m = k/2 = 0(k). From each 
graph G' € B , we can obtain G € A by connecting any two red vertices. Hence 
m! = (k/2 + 2) (k/2 + l)/2 = 0(k 2 ). 

Since q is fixed, it follows that k = 0(n). By Theorem 10, the quantum query 
complexity is I2V n * n 2 = ^(n 1 ' 5 ). 




146 



A. Berzina et al. 



The same idea proves the following two theorems. 

Theorem 5. Matching requires f2(n 15 ) quantum queries. 

Theorem 6. Partition into triangles requires f2(n 15 ) quantum queries. 

Theorem 7. The lower bound for Partition into cliques cannot be improved by 
Ambainis’ method 

Proof. We use Theorem 3. Let the Boolean function f describe Partition into 
cliques. NDi(f) = 0(n) , because it suffices to ask the edges for all the guessed 
subsets of vertices; all the subsets are of constant size. ND 0 (f) = 0(n 2 ), because 
it suffices to exhibit a subset of k + 1 vertices connected by no edges. Since 
k = 0(n ), (, k — l)k/2 = 0{n 2 ). 

Hence yj ND 1 (f) * ND 0 {f) = 0(n 15 ). 

Theorem 8. Parity problem requires I2(n 2 ) quantum queries. 

Proof. We construct the sets A and B for the usage of Theorem 1. The set A 
consists of all matrices M with n rows containing n symbols “vl” per row plus 
n rows containing n+1 symbols “1” per row. The set B consists of all matrices 
M with n-1 rows containing n symbols “1” per row plus n+1 rows containing 
n+1 symbols “12 per row. 

Every matrix M £ A can be transformed into a matrix M' £ B by taking an 
arbitrary row with n symbols “1” and transforming an arbitrary “0” into “1”. 
Hence m = n 2 . Every matrix M' £ B can be transformed into a matrix M £ A 
by taking an arbitrary row with n+1 symbols “1” and transforming an arbitrary 
symbol “1” into “0”. Hence m! — n 2 . 

By Theorem 1, the quantum query complexity is fl\/n 2 * n 2 = 12(n 2 ). This 
is the maximum possible lower bound. 

Lemma 4. If a graph G=(V,E), |P| = 5 n, satisfies the following requirements: 

• there are n mutually not connected (red) vertices, 

• there are 2n green vertices not connected with red ones, green vertices are 
grouped in pairs and each pair is connected by edge, 

• subgraph induced by the rest 2n vertices (black) is a complete graph and all 
black vertices are connected to all red and green vertices, 

then Hamiltonian circuit problem is solvable. 

Proof. We denote black vertices mi to m 2 n- Red vertices are denoted k\ to 
k n , pairs of green with k n +\ to k^n- Sequence m\k\ . . . m n k n m n +ik n +i ■ ■ ■ iri 2 n 
fc 2 n m i (i.e. black, red,... black, red, black, green, green,. ..,black, green, green, 
black) satisfies Hamiltonian circuit problem. 

Lemma 5. If graph G=(V,E), \V\ = 5 n, satisfies the following requirements: 

• there are n+2 mutually not connected (red) vertices, 




Quantum Query Complexity for Some Graph Problems 147 



• there are 2n-2 green vertices not connected with red ones, green vertices are 
grouped in pairs and each pair is connected by edge, 

• subgraph induced by the rest 2n vertices (black) is a complete graph and all 
black vertices are connected to all red and green vertices, 

then Hamiltonian circuit problem is not solvable. 

Proof. The red vertices and the pairs of green vertices are mutually not con- 
nected. The only way to get from one red vertex to another (or from one green 
pair to another) is through some black vertex. There are 2 n black in the graph, 
but n + 2 red vertices, and n — 1 green pair makes altogether 2n + l.So at least 
one of the black vertices will be used twice, which is not allowed in Hamiltonian 
circuit. 

Theorem 9. Hamiltonian circuit problem requires fl(n 15 ) quantum queries. 

Proof. We construct the sets A and B for the usage of Theorem 1. The set A 
consists of all graphs G satisfying the requirements of Lemma 4. The value of the 
function corresponding to the Hamiltonian circuit problem is 1. (This follows 
from Lemma 4.) The set B consists of all graphs G satisfying the requirements 
of Lemma 5. The value of the function corresponding the Hamiltonian circuit 
problem is 0. (This follows from Lemma 5.) 

From each graph G € A, we can obtain G' £ B by disconnecting any one of 
the edges, which connect the green vertices. Hence m = n = O(n). From each 
graph G' £ B, we can obtain G £ A by connecting any two red vertices. Hence 
m! = ( n + 2)(n + 1) = 0(n 2 ). 

By Theorem 1, the quantum query complexity is Ply/ n * n 2 = ^(n 1 ' 5 ). 

The same idea proves Theorem 10. 

Theorem 10. Directed Hamiltonian circuit requires Omega(n 15 ) quantum 
queries. 

Lemma 6. If graph G=(V,E), \V\ = 5 n, satisfies the requirements of Lemma 5, 
then Hamiltonian path problem is solvable. 

Proof. We denote black vertices mi to ni 2 n . Red vertices are denoted k\ to k n + 2 , 
pairs of green with k n + 3 to k 2 n + 1 - Sequence fcimi . . . k n + 2 m n + 2 k n + 3 irin +3 • • • 
TO 2 r ,,/c 2 n+i (i.e. red, black, ... red, black, green, green, black, ...,black, green, 
green) satisfies Hamiltonian path problem. 



Lemma 7. If graph G=(V,E), \V\ = 5 n, satisfies the following requirements: 

• there are n+4 mutually not connected (red) vertices, 

• there are 2n-f green vertices not connected with red ones, green vertices are 
grouped in pairs and each pair is connected by edge, 

• subgraph induced by the rest 2n vertices (black) is a complete graph and all 
black vertices are connected to all red and green vertices, 

then Hamiltonian path problem is not solvable. 




148 



A. Berzina et al. 



Proof. The proof is analogical to that of Lemma 5. 

Theorem 11. Hamiltonian path requires Omega(n 15 ) quantum queries. 

Proof. We construct the sets A and B for the usage of Theorem 1. The set A 
consists of all graphs G satisfying the requirements of Lemma 6. The value of 
the function corresponding to the Hamiltonian path problem is 1. The set B 
consists of all graphs G satisfying the requirements of Lemma 7. The value of 
the function corresponding the Hamiltonian path problem is 0. 

From each graph G £ A, we can obtain G' £ B by disconnecting any one 
of the edges, which connect the green vertices. Hence m = n — 1 = 0(n). From 
each graph G' £ B, we can obtain G £ A by connecting any two red vertices. 
Hence m' = (n + 4 )(n + 3) = 0(?r 2 ). 

By Theorem 1, the quantum query complexity is 12 y/ n * n 2 = ^(n 1 ' 5 ). 

Theorem 12. Travelling salesman problem requires Omega(n 1 " 5 ) quantum 
queries. 

Proof. Travelling salesman problem can be easily reduced to Hamiltonian 
circuit problem, by taking all the distances equal to 1 and B equal to number 
of cities. 

Theorem 13. The lower bound for Hamiltonian circuit cannot be improved 
by Ambainis’ method. 

Proof. We use Theorem 3. Let the Boolean function / describe Hamiltonian 
circuit. NDi(f) = 0(n ) , because it suffices to guess the sequence of vertices 
and ask the edge for every pair of subsequent vertices. ND 0 (f) = 0(n 2 ), because 
it suffices to check that a graph satisfies conditions of Lemma 5. 

Hence = y/ND^f) * ND 0 (f) = 0(n 15 ). 

Lemma 8. If tree G=(V,E), |V| = 3n+l, K=n, satisfies the following require- 
ments: 

• root vertex is connected to n red vertices, 

• each of n red vertices is connected to exactly one green vertex, 

• each of n black vertices is connected to exactly one red vertex, but any red 
vertex can be connected to any number of black vertices, 

then Dominating set problem is solvable. 

Proof. We have a tree with three layers: root layer, red vertices layer and green 
and black vertices layer. Middle layer, which consists of n red vertices, satisfies 

Dominating set problem. 

Lemma 9. If tree G=(V,E), |V| = 3n+l, K=n, satisfies the following require- 
ments: 

• root vertex is connected to n red vertices, 

• each of n red vertices is connected to exactly one green vertex, 




Quantum Query Complexity for Some Graph Problems 149 



• each of n-1 black vertices is connected to exactly one red vertex, and there is 
one black vertex that is connected to another black vertex, 

then Dominating set problem is not solvable 

Proof. We have a tree with four layers: root layer, red vertices layer and green 
and black vertices layer, and the fourth is the layer with only one black vertex. 
That means that we need at least one more vertex in Dominating set in addition 
to middle layer, which gives us n+1 vertices. 

Theorem 14. Dominating set problem for trees requires Omega(n 1 ' 5 ) quan- 
tum queries. 



Proof. We construct the sets X and Y for the usage of Theorem 2. The set 
X consists of all graphs G satisfying the requirements of Lemma 8. The set Y 
consists of all graphs G satisfying the requirements of Lemma 9. 

From each graph G € X, we can obtain G' £ Y by removing any one of the 
edges, which connect the black and the red vertex (there are n ways we can do 
it) and connecting that black vertex to any other black vertex (n ways to do it). 
Hence m= 0(n 2 ). From each graph G' £ Y, we can obtain G £ X by removing 
the edge connecting two black vertices (1 way to do it) and connecting the free 
black vertex to any red one (n ways to do it). Hence m' = 0(n). 

Now we’ll find max(Zj*Z(). For any edge connecting black and red vertices k = 
n, because when we remove black - red edge, we can do that in n combinations 
of remove-and-create operations, and If = 1, because we can build black-red edge 
using only one combination; for any edge connecting two black vertices U = 1 
and l[ = n with the same idea. Thus max(Zi * If) = n. 



By Theorem 2, the quantum query complexity is PI 




f2(ri). 



Theorem 15. The lower bound for Dominating set for trees cannot be im- 
proved by Ambainis ’ method. 



Proof. We use Theorem 3. Let the Boolean function / describe Dominating set 
for trees. ND/f) = 0(n), because it suffices to guess dominating set of vertices 
and the connections to it from other vertices. ND 0 (f ) = 0(n), supposing that 
tree is given with adjacency list, because there are only n-1 edges in the tree. 
Hence = y/ND/f) * ND 0 (f) = O(n). 

Lemma 10. If graph G=(V,E), |V| = n, K=3n/4~1, satisfies the following re- 
quirements: 

• there are n/2 + 2 red vertices, pairwisely connected, 

• there are n/2 - 2 black vertices, not mutually connected, 

then Dominating set problem is solvable. 

Proof, n/2+2 red vertices make n/4+1 dominating vertices, adding n/2-2 black 
vertices gives 3n/4-l dominating vertices, which satisfies Dominating set prob- 
lem. 




150 



A. Berzina et al. 



Lemma 11. If graph G=(V,E), |V| = n, K=3n/4~1 , satisfies the following re- 
quirements: 

• there are n/2 red vertices, pairwisely connected, 

• there are n/2 black vertices, not mutually connected, 

then Dominating set problem is not solvable. 

Proof, n/2 red vertices make n/4 dominating vertices, adding n/2 black ver- 
tices gives 3n/4 dominating vertices, which is more than K and doesn’t satisfy 

Dominating set problem. 

Theorem 16. Dominating set problem requires I2(n 15 ) quantum queries. 

Proof. We construct the sets A and B for the usage of Theorem 1. The set A 
consists of all graphs G satisfying the requirements of Lemma 10. The set B 
consists of all graphs G satisfying the requirements of Lemma 11. 

From each graph G £ A, we can obtain G' £ B by removing any one of the 
edges, which connect red vertices. Hence m = (n/4 + 1) = 0(n). From each 
graph G' £ B , we can obtain G £ A by connecting any two black vertices. Hence 
m’=(n/2)(n/2 — 1) = 0(n 2 ). 

By Theorem 1, the quantum query complexity is Q\/n * ri 2 = ^(n 1 ' 5 ). 

Theorem 17. The lower bound for Dominating set cannot be improved by 
Ambainis’ method. 

Proof. We use Theorem 3. Let the Boolean function / describe Dominating set. 
NDi(f) = 0(n), because it suffices to guess dominating set of vertices and the 
connections to it from other vertices. ND 0 {f) = 0(n 2 ), because it suffices to 
read all edges and run deterministic algorithm. 

Hence = s/ND/f) * ND 0 (f) = 0(n 15 ) . 

References 

1. Ambainis, A.: Quantum Lower Bounds by Quantum Arguments. Journal of Com- 
puter and System Sciences, 64 (2002) 750-767 

2. Ambainis, A.: Personal communication (2003) 

3. Bennett, C.H., Bernstein, E., Brassard, G., Vazirani, U.V.: Strengths and Weak- 
nesses of Quantum Computing. SIAM Journal on Computing, 26 (1997) 1510-1523 

4. Buhrman, H., de Wolf, R.: Complexity Measures and Decision Tree Complexity: A 
Survey. Theoretical Computer Science, 288 1 (2002) 21-43 

5. Freivalds, R., Winter, A.: Quantum Finite State Transducers. In: SOFSEM 2001 
(2001) 233-242 

6. Grover, L.: A Fast Quantum Mechanical Algorithm for Database Search. Proceed- 
ings of the 28th ACM symposium on Theory of Computing (1996) 212-219 

7. Gruska, J.: Quantum Computing. McGraw-Hill (1999) 

8. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information. Cam- 
bridge University Press (2000) 




A Model of Versioned Web Sites* 



Maria Bielikova and Ivan Noris 



Faculty of Informatics and Information Technologies, Slovak University of Technology 
Ilkovicova 3, 812 19 Bratislava, Slovakia 

bielik@f iit . stuba. sk 
http : //www. f iit . stuba. sk/~bielik 



Abstract. In this paper we present a model of versioned web sites which 
is aimed at building a web site configuration. The web site configuration 
is a consistent version of the web site and serves for navigation purposes. 
We exploit the fact that the versioning of web sites is in many aspects 
similar to versioning of software systems (and their components). On 
the other hand, specific characteristics related to the web environment 
and web sites in particular are considered. The web site is modelled by 
an AND/OR type graph. The model serves as a useful abstraction sim- 
plifying the process of configuration building. Being essentially a graph 
search, it is inevitable to have a method for selecting a proper version. 
Presented approach is best suited for web sites where several variants 
of web pages exist. It is advantageous for example for presentation of 
multilingual web sites. We briefly discuss developed software tool for 
versioning and navigation on the multilingual web site which is based on 
proposed model of versioned web site. 



1 Introduction 

Web sites evolve by changing their content and structure over time. Change 
is inevitable. However, the web today as a rule supports only one version of 
a document - the current one. Requirement to store and access previous ver- 
sions of the web content, retrieve the history of the content, annotate revisions 
with comments about the changes, or navigate through a versioned web site is 
explicitly noted already in [2] . This requirement follows the evolution in the area 
of hypermedia research, where version control has been identified as a critically 
important task [8]. 

The literature lists many reasons for saving the history of an object (be 
it software component, hypermedia node, or web page), including distributed 
and collaborative development, keeping old versions for later use, assuring the 
safety of recent work against various kinds of accidents, preserving cited work in 
the original state. The purpose of document versions is not only a change. For 

* This work has been supported by the Grant Agency of Slovak Republic grant No. 
VG1/ 0162/03 “Collaborative accessing, analysis and presentation of documents in 
internet environment using modern software tools” . 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 151-162, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




152 



M. Bielikova and I. Noris 



example, to be accessible to a large audience web sites often contain information 
written in more than one language - each can be considered as a version. 

Evolution of information presented on the web and related problems are dis- 
cussed in several works [18], [14], [4], [17]. Content management systems often 
provide versioning in the sense of software systems versioning [6] . A user - con- 
tent manager - can store snapshots of his work (revisions) and come back to 
them later, or develop the web site by collaboration in a distributed team. Such 
scheme can be sufficient from a developer point of view, but not from a user 
(reader) point of view, who stands in a need of navigation through versioned 
web site. 

In this paper we address the problem of computer support for navigation 
within a versioned web site. We have proposed a model of versioned web site 
which is simple, but still sufficiently rich to reflect the principal relations and 
properties which are decisive in the web site configuration building. The web site 
configuration is a consistent version of the web site that serves for navigation 
purposes. We have adopted the AND/OR graph model formalized for software 
systems in [3] . Semantics of the model is specified according to specific properties 
of web sites. 

Central notions of proposed model include following: web pages are cha- 
racterized by properties defined at three levels: family level, variant level and 
revision level; links are established with respect to a family of target page rather 
than the pages themselves; page to page connections are established at time 
of a configuration building; several revisions (of different page variants) can be 
included in the configuration. 



2 Modelling a Web Site 

A model is used to express a web site structure, respecting in our case the point of 
view of navigation of the versioned web site (i.e. building its configuration). The 
web site configuration can be used for off-line browsing within selected version 
of the versioned web site. 



2.1 Elements of the Model 

A web site consists of several independent parts (nodes) interconnected by links. 
Each node - a web page - primarily comprises the content going to be presented. 
Obviously, there exists a mechanism for presentation layout definition (either 
embedded in each node or represented separately in one place for several nodes, 
or the whole web site). The web page often comprises both the content in the 
form of a text or other media and chunks of programming source code which 
provide dynamic content. 

We consider two scenarios for a web page version creation [5], [6], [18]. First, 
versions are created to represent alternate forms of a page. Such ’parallel’ ver- 
sions, or variants , are frequently results of alternative realizations of the same 




A Model of Versioned Web Sites 



153 



concept (e.g., multilingual variants). The variants can evolve independently. Se- 
cond, versions are created to represent improvements of previous ones, or as 
modifications caused by an error correction, content enhancement, and/or adap- 
tation to changes in the environment. Such ’serial’ versions, or revisions, are 
frequently results of modifications of the same variant. A family of web pages 
comprises all web pages which are versions of one another. Note that the concept 
of parallel versions induces an equivalence relation within the set of web pages. 
We shall use the term version in cases where both variants or revisions can be 
considered. 

Let us formulate the notions more formally now. Let Ps be a set of web 
pages of a web site S. Then a binary relation isjversions C P s x Ps is given as 
the reflexive and transitive closure of another binary relation which is defined by 
elementary transformations describing such modifications of web pages that they 
can still be considered to be expressing essentially the same content. Relation 
isjversions is reflexive, symmetric and transitive. 

A set of all equivalence classes induced by the isjversions relation is denoted 
Fs and called a set of families of web pages of the web site S. An element of 
Fs is called a family of web sites. In other words, a family consists of web pages 
which are related by relation isjversion. Usually such web pages are presented by 
means of the relation is .developed. f rom as so called version graph [11], [10], [6]. 
Nodes denote various versions as they are created; an arrow from version A to 
version B indicates that B was created from A. All the web pages included in 
the version graph form a family of web pages. 

We define for each versioned web page a set of properties (using attribute- 
value pairs) and its content (usually a HTML text, scripts, and embedded me- 
dia). Based on that, we consider variants as sets of those web pages which share 
certain properties. This conceptual design choice does not impose any serious 
limitations in most cases. On the contrary, it provides a considerable flexibility 
to the specification of versioned web site navigation. It offers a useful abstrac- 
tion that should simplify the process of navigation. In order to describe variants, 
we define a binary relation isjvariant which determines a set of web pages 
with the same subset of properties within a given family. The binary relation 
is -Variants C P$ x Ps is defined by: 

x isjvariants y -O- x isjversions V A x.VariantAttr = y .Variant Attr 
Variants are important to simplify management of web page versions when 
selecting a revision of some page, or during an automatic navigation within the 
versioned web site. We can treat a whole group of web pages in a uniform way 
due to the fact that all of them have the relevant properties defined as equal 
(e.g., the content is written in English). 

One consequence of our design decision of taking variants to be sets of web 
pages is that from the two kinds of versions of web pages, only revisions are 
left to represent actual single web pages. Figure 1 depicts the above defined 
relationships. 

The distribution of web pages to variants depends on a decision which prop- 
erties are considered as variant properties and which are considered as revision 




154 



M. Bielikova and I. Noris 




Fig. 1. Family-Variant-Revision relationships 



properties, i.e. unique properties of the actual web page. Decision about dis- 
tributing attributes is left open for a developer in our approach because it de- 
pends on the project, its size, problem domain, etc. Typical recommendations 
applicable in many cases are to consider as variant attributes specific characteris- 
tics of the user knowledge (e.g., novice, intermediate, advances), characteristics 
of a developer environment (e.g., HTML version), or characteristics of a user 
environment (e.g., browser used). This means that a change of these properties 
leads to a new variant. Properties related to the development process such as 
state, change description, author, date, time are often considered as revision 
attributes, i.e. their change leads to a new revision. 

2.2 Model Formulation 

The concepts introduced above allow us to formulate a model of a versioned 
web site which would support a navigation. Determining versions for navigation 
resembles a configuration building in software configuration management. From 
the point of view of a family, a model should involve families and variants in- 
cluded in them. Links from a family to all its variants are defined by the relation 
hasjuariants C F$ x Vg ■ x hasjuariants y <t=> y C x. 

From the point of view of a variant, the model should represent links to all 
those families which are referred to in revisions of that variant. When building a 
configuration, for each family already included in a configuration there must be 
selected at least one variant. For each variant already included in a configuration, 
there must be included precisely one revision. For all selected revisions, all the 
families related by links to that revision must be included. A configuration com- 
prising more than one variant of the web page is inevitable for example in case of 
building the configuration of a multilingual web site with specific requirements 
for English and Slovak language pages. 



A Model of Versioned Web Sites 



155 



Our method of modelling a web site S is to describe it by an oriented graph, 
with nodes representing families and revisions in such a way that these two kinds 
of nodes alternate on every path. Let Fg be a set of families of web pages, Pg 
be a set of web page revisions of a web site S. Let FN be a set of names and 
/ Jiameg : Fg — > FN an injective function which assigns a unique name to each 
family of a web site S. Let A C Pg x FN be a binary relation defined as: 
ei A e 2 -O- 3 r(r € e\.Link A r. Family Id = e%) 

Let O C FN x Pg be a binary relation defined as: 
ei O e2 <t=> e2 £ f-nameg(e 1). 

We define a model of a web site S to be an oriented graph Mg = (TV, E ), where 
N = FNg U Pg is a set of nodes with FNg = { x \ x £ FN A fjnameg 1 (x) £ 
Fg}, and E = AuO is a set of edges such that every maximal connected subgraph 
has at least one root. 

We remark that the binary relation A stands for hyperlinks (relating revi- 
sions to families) and the relation O mirrors has -revision relation. Such A/O 
graph model refers for the previously introduced notions (revision, variant, fam- 
ily). Variants are covered in the model implicitly through sets of A- nodes which 
represent revisions. 

The usual interpretation is that A-nodes are origins of edges leading to nodes, 
all of which must be considered provided the A-node is under consideration 
(logical AND). Similarly, O-nodes are origins of edges leading to nodes, from 
among which exactly one must be considered provided the O-node is under 
consideration (logical OR). 

The example of a web site model is depicted in Figure 2. For the sake of clarity 
we depict also variants and relation hasjvariant. One possible configuration for 
the content written in Slovak language is highlighted (bold). 

Proposed model is a version-family kind of model (an analogy of intertwined 
model known in software engineering [6]). The navigation relations are defined 
for each revision and the revisions are connected to the web page families only. 
Therefore, when a new version of target of a relation is created, the relation itself 
is not affected, so there is no need to create new version of the source component. 
Definition of links within revisions and families compromises the complexity of 
the model and subsequent support for navigation, and its flexibility. 

If all links were defined on the variant level (all revisions within a variant 
share the same links), we would be able to exploit the model on several levels 
of abstraction: as an abstract model containing families and variants; a generic 
configuration containing selected variants, and bound configuration containing 
interconnected revisions. However, requirement for links definition on the variant 
level is in many cases too limiting. It can be successfully used for multilingual 
sites with only several revisions. 

3 Navigation and Configuration Building 

Process of navigation within a versioned web site is based on determining a target 
of the selected link. The navigation procedure is defined as a graph search, 




156 



M. Bielikova and I. Noris 




where the graph constitutes model of the versioned web site. We use the version 
selection filters which are conditions applied to all versions of appropriate families 
of the web pages [13]. 

Any version can have its own and independent attribute structure, so it can 
be modified without affecting other versions. This makes the attribute structure 
sufficiently flexible. On the other hand, some attributes (e.g., Language, Browser) 
are supposed to be shared between several revisions of the web page (or web page 
families). Several types of attributes are defined (e.g., string, number, time, list, 
set) . A set of system attributes further improves the management of meta-data 
related to the versioning. The system attributes are automatically set by a tool 
providing the model of versioned web site. We distinguish several types of system 
attributes: 

Read-only system attributes: their values are set only once after creating a new 
revision (e.g., InsertTime, InsertUser). The values cannot be changed. 

Auto-updating system attributes: their values are updated automatically after 
each version significant change (e.g., ChangeTime, ChangeUser). 

Default-value system attributes: they have some predefined values, but can be 
changed later (e.g., VersionCodename, Owners, Author, Keywords). 



A Model of Versioned Web Sites 



157 



In the proposed model we use variant attributes to distinguish between revisions 
of various variant sets. Each variant wires revisions with the same values of the 
variant attributes. In the course of version selection process the variant attributes 
are evaluated first. Any attribute may be flagged as the variant attribute. The 
revisions are grouped into variants considering their properties. 

The version selection is a two-step process: first, the corresponding variant 
and then the revision is selected. Thus, the variants exist only on a “logical 
level”. Version selection is described and formalized in [12]. We concentrate in 
this paper on specific features applicable for version selection of web pages. 

3.1 Version Selection Filters 

Version of a web page is selected using a set of selection filters. Selection filter 
is represented by a logical expression which operates on version properties. 

Limitation of the proposed approach is that selection filters need not gua- 
rantee that exactly one version is selected. If the sequence of filters is too strict, 
none of the versions would match. On the other hand, more than one version 
could match loose filters. To avoid such cases, an implementation of the proposed 
approach should allow to refine (add, modify or even remove) the filter, at least as 
a user-initiated action. Also internal restrictions based on e.g., last modification 
time, can be implemented to filter out all but one matching version. 

Version selection filters can be of two kinds: user-defined filters, and default 
filters. The option to save a sequence of filters as a “named configuration spec- 
ification” makes the version selection mechanism more flexible and allows its 
reuse. 

User-defined filters. The user-defined filters can be entered by any visitor of 
the versioned web site. The filters are defined explicitly on the user request. The 
number of filters is not limited in our proposal, however, some implementation 
limitations are expected to occur. The user-defined filters can be defined for any 
attribute and any required values. 

Default filters. Default filters are defined by the web page author and are 
automatically applied when accessing a versioned web page. The need for default 
filters is based on the fact that a version-unaware users could easily visit the web 
site. Such users are not expected to be concerned with version control and may 
feel confused when dealing with the attributes and configuration specifications. 
On the other hand, it may be useful for the content author to guide the visitors 
in some way. 

We distinguish static and dynamic default filters. A static default filter spe- 
cifies a version in such a way that its evaluation will lead always to the same 
version. The dynamic default filters allow the author to set conditions which will 
automatically apply when the version within the family should be selected. The 
simplest dynamic filter selects the last inserted revision. 

The dynamic default filters are flexible and allow the author to write condi- 
tions which ensure a version selection according to the current browsing session. 
The browser type, preferred language, client’s top-level domain etc. can be used 




158 



M. Bielikova and I. Noris 



to determine user’s requirements. The dynamic default filters can also be used 
to distinguish local and remote users. They are evaluated automatically, if the 
filter trigger has been activated. The trigger’s actual value is compared with the 
author-supplied value. If the values match, the trigger is activated and the filter 
based on the trigger would apply. If more filters use the same trigger, they are 
defined separately. In developed software tool (see next section) we have used 
environment variables as filter triggers (e.g., HTTP_USER_AGENT variable stores 
identification of the current client; HTTP_ACCEPT_LANGUAGE contains accept lan- 
guage codes sorted by client’s preference). These variables are set automatically 
by the web server. 

Specified sequence of default selection filters can be set at the logon time 
for certain users or groups of users. Presentation of a versioned web site using 
described mechanisms becomes adaptive. 

We have also proposed a mechanism for accession alternatives of currently dis- 
played page. This is accomplished by conditions for alternatives. The alternative 
versions have different values for combinations of the specified attributes. Con- 
ditions for alternatives together with displaying mechanism allow a user to see 
links to the corresponding versions with the alternative content (e.g. written in 
different language) on each visited page. 

On a user request the value of selected attribute in selected filter changes 
and the alternative version is provided. In developed software tool, we use only 
one attribute for updating the conditions (the ContentLanguage attribute) . This 
simplification is partially based on the purpose of the developed tool (to provide a 
support for multilingual sites). However, the greater number of attributes implies 
the greater number of possible combinations of their values produced, i.e. the 
number of alternate contents would grow exponentially. 



3.2 Configuration Building 

A configuration building is the process of selection appropriate revisions for all 
web pages families to be included in the configuration. The configuration can 
be built for off-line content reading or by actual browsing through the versioned 
web site. This process consists of four basic steps: 

1. Select a starting family / of web pages (the O-node in the model); obviously 
the starting family is the root of the web site model. 

2. Select appropriate revision v of the web page (one of the A-nodes connected 
to the family / by has .revision relation); this step is based on specified 
version selection filters; at first variant is selected and then revision within 
this variant [13], [12]. 

3. Provide attached elements for revision v, provide versions of attached ele- 
ments. 

4. IF the stop condition is false THEN select next web page to be included in 
the configuration (all those families which are referred to in the revision v 
are considered) and go to the step 2. 




A Model of Versioned Web Sites 



159 



If the configuration is created in a browser, the family considered in the next 
cycle (step 4) is determined by a visitor by clicking on a link. Otherwise, next 
family for processing is selected according the search strategy adopted (e.g., 
depth-search or breath-search strategy). 

4 An Application of the Model 

We developed a software tool called DiVer for support of the navigation on 
versioned web sites [13] . DiVer implements described model for web pages written 
in HTML. Its primary purpose was presentation of multilingual web sites. We 
considered following requirements while developing a prototype of versioned web 
site navigation tool: 

1. URLs pointed to the web page from the external (unversioned) space should 
remain working; 

2. a reader should understand that he is viewing a versioned page; he should 
be able to access different versions of the displayed page; 

3. there must be a mechanism for navigation within versioned space (method 
for web site configuration presented in the previous section is exploited); this 
mechanism should mimic standard web navigating (as without versions); 

4. web pages with alternative content (written for example in different lan- 
guages) should be easily accessible. 

DiVer tool similarly to the V-Web tool [14] adds to the top of a web page 
a frame containing a textual depiction of the web page’s version information 
together with possibility to select a version, define selection filters or see alter- 
native versions. Original HTML page is replaced by a new HTML composite 
page which comprises menu and the content of the selected version from an 
archive of versions (see Figure 3). 




Fig. 3. Versioned web page 

The version control related attributes are defined only for versions (variants, 
revisions) of the HTML-elements. In our opinion, there is no need to define 
attributes of other elements, e.g., images, because the images are included in 
the composite HTML elements and not selected into configurations separately. 
This simplification significantly influences the data that have to be stored and 
processed. Of course, the attached components are also versioned and can be 



160 



M. Bielikova and I. Noris 



identified and accessed by the version number. Therefore, only HTML-elements 
are represented in the model. When the HTML-element revision is selected, ap- 
propriate revisions of all attached elements are selected. The information about 
attached elements and their versions is stored in the attribute structure of the 
HTML-element revision. 

Our software tool uses RCS [15] as a revision control back end on the server 
side. The graphical front end on the client side is developed in Perl language. The 
archive library is used to save and restore version attributes which are stored 
separately in XML archives. Version attribute structure is used to select appro- 
priate version while creating the configuration. “Cookie” mechanism is used to 
transfer the selection filters. RCS uses a check-in/check-out paradigm to create 
revisions. It organizes the revisions into an ancestral graph and stores them in a 
file called an archive. We suppose that a tool which implements proposed model 
will rest the version control responsibility for the files with a revision control 
system such as RCS, while responsibility for maintaining the relations between 
the files will reside with the tool. 

The DiVer tool conforms WebDAV protocol (Distributed Authoring and Ver- 
sioning over the Web) [1] and its DeltaV extension (Web Versioning and Con- 
figuration Management) [11] (however, it was not developed with assumption 
of a web server supporting such extended HTTP protocol because of non avail- 
ability of the WebDAV web server support in time of the tool development). 
The tool for navigation in versioned web space built on our model should be 
responsible for substituting the family URL (the same as for unversioned page) 
by the DeltaV stable URL. 

5 Related Work 

In the hypermedia field, the problems connected to version control and config- 
uration management have been frequently examined and discussed [16]. Also 
a wide variety of research has attempted to deal with versioning issues on the 
web. Example of this work include [14], [1], [2], [11], [17]. 

Two basic version models are used in area of hypermedia [7]: state-based 
versioning which maintains the version of an individual resource, and task-based 
versioning which focuses on tracking versions of complex systems as a whole. 
These concepts are similar to those of state-based and change-based versioning 
as known in software engineering [6]. Proposed model is oriented towards state- 
based versioning. While it does not support the tracking of a set of changes, it 
enables effective and efficient realization within web environment. 

Several hypermedia systems define links on the level of particular pages 
(nodes) (see a comparison of hypermedia data models presented in [19]). In 
our model links are defined on the version — family level. This means that the 
link points to the family of web pages and is resolved on time of a configuration 
building (or navigating the web site). Linking on the level of page versions can 
lead into many broken links when versioned items are deleted from the public 
web repository (even if this is not consistent with the idea behind versioning - 




A Model of Versioned Web Sites 



161 



to preserve all states of an entity deleting some versions from the web public 
presentation is prevalent). 

Important issue while discussing versioning of web documents (or hypermedia 
in general) is that of links versioning. Several different solutions are described 
in [16]. There is no consensus on the issue whether links should be modelled (and 
represented) separately from the content. We can find approaches where links 
are embedded in the content; or links are represented separately; or some links 
are local and modelled within the content (their change causes creation of the 
new version) and some links are external (represented separately, their change 
does not cause creation of a new version). Although at present, linking on the 
web consists in primarily of tags embedded in HTML, XLink proposal provides 
for storing links between XML documents externally to the documents they 
reference (see http://www.w3c.org/XML/Linking). Realization of our model 
does not restrict in any way the representation of versioned entities. Defining 
version — family model covers the structure versioning, i.e. within each revision 
relationships are maintained. 

We do not introduce a new hypermedia model. But we build on the Dexter 
hypermedia model [9] and propose its specialisation with the aim of allowing 
efficient navigation within versioned web site. 

6 Conclusions 

Versioned web site offers significant advantages to the content developers and 
readers. It provides mechanisms to allow version-dependent navigation through 
the site. Content developers can concentrate on the content and relationships 
between versioned families of web pages. They do not have to deal with com- 
plexities of versioning and versioned navigation implementation. 

We have proposed a model for versioned web site which aims at computer 
support of the process of the web site configuration building. We have concen- 
trated in this paper on modelling presentation space (on the contrary of many 
existing approaches which stress on the development space). In fact, both spaces 
can exploit devised model. Basic distinction would be in granularity of version- 
ing. A limitation of our approach is that the model does not provide substantive 
web site structure modelling. However, there is no model which suits all purposes. 

The advantages of our proposal include: 

— the model is simple and effective (version-family links present the main ad- 
vantage of the model which produces also above mentioned limitation), 

— the model can be used in web site design on several levels of abstraction, 

— navigation within versioned site is intuitive, 

— the model is ready for using with current web technologies (as demonstrated 
by the DiVer software tool). 

Dynamic approach to resolving links improves the maintenance of the web 
site integrity. If a user has stored a bookmark to the particular version, it points 
to the page family with version selection data. In the case of missing version (or 
no version satisfying the selection filter) the user has still an option to select 
different version. 




162 



M. Bielikova and I. Noris 



The model can serve also during the web site design (its structure and navi- 
gation) . Several models on various level of abstraction can be constructed. Levels 
of abstraction regard to hierarchy of composite elements (web site versus web 
page) and to interconnections (explicit versus implicit links). 



References 

1. IETF WEBDAV working group, www.ics.uci.edu/pub/ietf/webdav 

2. Berners-Lee, T.: Versioning, 1990. A web page that is part of the original design 
notes for the WWW, available at www.w3.org/DesignIssues/Versioning.html 

3. Bielikova, M., Navrat, P.: Modelling Software Systems in Configuration Manage- 
ment. Applied Mathematics and Computer Science 5 4 (1995) 751-764 

4. Bielikova, M., Navrat, P.: Modelling Versioned Hypertext Documents. In: Magnus- 
son, B. (ed.): System Configuration Management, ECOOP’98 SCM-8 Symposium, 
Brussels, Belgium. LNCS 1439. Springer- Verlag (1998) 188-197 

5. Bielikova, M., Navrat, P.: An Approach to Automated Building of Software Sys- 
tem Configurations. International Journal of Software Engineering and Knowledge 
Engineering, 9 1 (1999) 73-95 

6. Conradi, R., Westfechtel, B.: Version Models for Software Configuration Manage- 
ment. ACM Computing Surveys, 30 2 (1998) 232-282 

7. Haake, A., Hicks, D.: VerSE: Towards Hypertext Versioning Styles. In: Proc. of 
the 7th ACM Conf. on Hypertext, Washington DC, USA, March 1996, 224-234. 
Available at www . cs . unc . edu/~barman/HT96/ 

8. Halasz, F.G.: Reflections on Notecards: Seven Issues for the Next Generation of 
Hypermedia Systems. Communications of the ACM, 31 7 (1988) 836-852 

9. Halasz, F.G., Schwartz, M.: The Dexter Hypertext Reference Model. Communi- 
cations of the ACM, 37 2 (1994) 30-39 

10. Hicks, D.L., Leggett, J.J., Nurnberg, P.J., Svhnase, J.L.: A hypermedia Version 
Control Framework. ACM Transactions on Information Systems, 16 2 (1998) 127- 
160 

11. Hunt, J.J., Reuter, J.: Using the Web for Document Versioning: An Implementation 
Report for DcltaV. In: Proc. of the 23rd Int. Conf. on Software Engineering, 
Toronto, May 2001. IEEE Press, 507-513 

12. Navrat, P., Bielikova, M.: Knowledge Controlled Version Selection in Software 
Configuration Management. Software - Concepts and Tools 17 (1996) 40-48 

13. Noris, I.: Building a Configuration of Hypertext Documents. Master’s Thesis, 
Department of Computer Science and Engineering, Slovak University of Technology 
(2000), (supervised by Maria Bielikova) 

14. Sommerville, I., Rodden, T., Rayson, P., Kirby, A., Dix, A.: Supporting Informa- 
tion Evolution on the WWW. World Wide Web, 1 1 January 1998, 45-54 

15. Tichy, W.F.: RCS - a System for Version Control. Software-Practice and Experi- 
ence, 15 7 (1985) 637-654 

16. Vitali, F.: Versioning Hypermedia. ACM Computing Surveys, 31 (4es) (1999) 

17. Vitali, F., Durand, D.G.: Using Versioning to Support Collaboration on the WWW. 
In: Proc. of 4th World Wide Web Conference (1995). Available at 

www . w3 . org/pub/Conf erences/WWW4 

18. Whitehead, E.J.: An Analysis of the Hypertext Versioning Domain. PhD Thesis, 
University of California, Irvine (2000) 

19. Whitehead, E.J.: Uniform Comparison of Data Models Using Containment Mod- 
eling. In: Proc. of ACM Conf. on Hypertext - HT’02, ACM, June 2002 182-191 




Design of Secure Multicast Models for Mobile Services 



Elijah Blessing, R . and Rhymend Uthariaraj, V. 

Ramanujan Computing Centre, Anna University, Chennai, India 
eli j ahblessing@yahoo . com, rhymend® annauniv. edu 



Abstract. Mobile multicast service is an emerging technology in this decade. 
Incorporating security features in multicast services gives rise to overheads and 
computational complexities. The secure movement of members between areas 
gives rise to additional overheads with reduced throughput and increased 
complexities at the server. Designing an efficient and secure multicast model 
for mobile services is a challenging area for the researchers. In this paper, two 
models has been designed and proposed. The algorithms for these models have 
been experimentally simulated, tested, analyzed and compared with the already 
existing models. The experimental results prove that the proposed Enhanced 
LeaSel model has increased throughput with complexity 0(1), when compared 
to existing models whose complexity is 0(N A +1 +2n). 

Keywords. Multicast, mobile, security, complexity, encryptions, key distri- 
bution, Basic simple model. Enhanced LeaSel. 



1 Introduction 

Mobile multicast service is an efficient means of distributing data to a group of 
participants. With the widespread use of internet, securing data transmission is an 
important requirement for many mobile services. The data is secured by encrypting it 
with the group key, which is shared by all the members of the mobile group. To 
achieve backward confidentiality and forward confidentiality [11], the group key 
should be changed whenever a member join or leave during the course of a multicast 
service. When there are large mobile groups with members joining, leaving and 
moving frequently, secure mobile multicast services gives rise to additional 
computational complexity with poor throughput. 

Many literatures are available for wired secure multicast services [1], [7], [8], 
[6], [10] and research works are in progress for mobile multicast services. In this 
paper, two different secure mobile multicast models viz, Basic simple model and 
Enhanced LeaSel model are designed and proposed. The algorithms of these models 
are simulated, tested and analyzed for all the mobile multicast events. They are also 
compared with the existing models. It is found that the Enhanced LeaSel model is best 
suited for mobile services whose members highly move between areas. The 
experimental results prove that when members move between areas, the Enhanced 
LeaSel model has increased throughput with complexity 0(1). 



* Research Scholar Under FIP from Karunya Institute of Technology, India 

P. Van Emde Boas et at. (Eds.): SOFSEM 2004, LNCS 2932, pp. 163-173, 2004. 

© Springer-Verlag Berlin Heidelberg 2004 




164 



E. Blessing, R. and R. Uthariaraj, V. 



The remainder of the section is organized as follows. The section 2 briefly lists out 
the multicast events for mobile services. The section 3 presents the objectives in 
designing the secure mobile multicast models. The section 4 gives a review of mobile 
multicast models. The section 5 explains in detail, the proposed multicast models and 
its algorithms for mobile services. The section 6 shows the experimental results and 
finally section 7 concludes. 



2 Mobile Multicast Events 

The following are the events for the mobile multicast services. 

Member JOIN: The authorized members join the mobile service by sending JOIN 
request. 

Member LEAVE: A member leaving the mobile multicast service can be voluntary or 
compelled. In Voluntary LEAVE, the authorized member sends a LEAVE request to 
the area controller. In Compelled LEAVE, if the member is not authorized to continue 
in the mobile service, the area controller expels the member by sending EXPEL 
message. 

Member TRANSFER: Due to mobility, the member moves from one mobile area to 
another mobile area.[4], [5]. 



3 Design Objectives 

The following are the objectives while designing an efficient multicast model for 
mobile services. 

Forward Confidentiality: If the members LEAVE the mobile multicast service, the 
encryption key should be updated for every LEAVE operation to prevent the former 
member accessing the future communications. This property is termed as Forward 
Confidentiality. [11], [12]. 

Backward Confidentiality: If the members JOIN the mobile multicast service, the 
encryption key should be updated for every JOIN operation to prevent the new 
member accessing the past communications. This is termed as Backward 
confidentiality. [11], [12]. 

Mobility: Mobility affects performance only when members cross the areas. The rate 
at which the members cross the areas is defined as mobility transfer factor. Even if the 
mobile members are highly dynamic with large mobility transfer factor, the model 
should be efficient with increased throughput. 

Computational efficiency: The computational complexity is determined based on the 
number of encryptions the server performs for key distribution and the total number 
of rekey distribution messages the server unicasts or multicasts. The secure multicast 
model for mobile service is said to be highly efficient only when it has less 
computational complexity at the server for all mobile multicast events. 




Design of Secure Multicast Models for Mobile Services 



165 



4 Review of Secure Multicast Models 

In this section, the algorithms proposed by B.Decleene et.al [4] is studied and 
evaluated for all mobile multicast events. In this algorithm [4], the Domain Key 
Distributor (DKD) generates the data key and uses it for encrypting the data. 
Whenever a new member joins a current session or an existing member leaves a 
session, a new data key [4] is generated and distributed to ensure both forward and 
backward confidentiality. The domain is further divided into areas and the Area Key 
Distributor (AKD) is responsible for distributing the data key to members within that 
area. 

When a member joins, the total number of key generation is 2. Therefore the key 
generation cost is O (2). For key distribution, the encryption complexity is 0(n+2) 
where n is the number of AKD’s in the domain. The rekey message distribution 
complexity is O (n+2). When a member leaves, the total number of key generation is 
also 2 and the key generation cost is 0(2). For key distribution, the encryption 
complexity is 0(N A - 1 + n ), where N A is the number of members in the area. The 
rekey message distribution complexity is 0(N A - 1 + n). For member transfer the 
algorithms proposed by B.Decleene were 

a) Baseline rekeying[4] : Here the total number of key generations is 4. Therefore 
the key generation cost is 0(4). The encryption complexity is 0(N A +1+ 2n). The 
rekey message distribution complexity is 0(N A +l+2n). 

b) Immediate rekeying[4]: The total number of key generations is 2. Therefore the 
key generation cost is 0(2). The encryption complexity is 0(N A +1). The rekey 
message distribution complexity is 0(N A + 1). 

5 Proposed Secure Multicast Models for Mobile Services 

In this section, the algorithms of the proposed secure multicast models for mobile 
services are described. There are two types of specialized controllers that control, 
manage, generate and distribute the keys. They are Domain Controller and Area 
Controller. All the members in the service belong to the domain controlled by a 
Domain controller. Based on the administrative regions, the domain is divided into 
areas. Each area is controlled and managed by area controller. The models consists of 
three different keys viz. member private key(k), domain key(DK), area key (AK). The 
domain key is shared between domain controller and area controller. The area key is 
shared between area controller and all the members. Let there be nine members in the 
mobile multicast service MMS with individual keys k l ,k 2 ,...k, J respectively. Let there 
be three areas A p A 2 and A 3 controlled by area controllers ACj, AC,, and AC 3 
respectively. Let N A be the number of members in the area. Assume that A 3 contains 
member’s m 7 , m g , m 9 and the area controller generates AK 3 . Throughout this section 
the notation [y]x represents y is encrypted by x. 

Group creation: When the member registers to participate in the mobile service, the 
DC distributes the member private key(k) to all the registered members of the group 
securely. Then the DC prepares the access Control List (ACL) and distributes 
(multicasts) it to all the AC. The information in the ACL includes the session for 
which the member is authorized to receive the mobile multicast data. 




166 E. Blessing, R. and R. Uthariaraj, V. 

Initial Group: 

MMS = {mi m 2 , m 3 , mi, m 5 , m^, m 7 , m 8 , m 9 } 

Ai = {mi, m 2 , m 3 }; All members of A t know area key AKi 

A 2 = {mi, m 5 , m 6 }; All members of A 2 know area key AK 2 

A 3 = {m 7 , m g , m 9 }; All members of A 3 know area key AK 3 



5.1 Basic Simple Model 

This model is well suited for multicast services whose members rarely move between 
areas. 

Member JOIN: When a new member joins, the area controller verifies and authorizes 
it. The Area controller then changes the area key AK to AK’ to ensure backward 
confidentiality. To accomplish this task, the area controller sends the 
KEYUPDATE_JOIN message in encrypted form to the current members and the 
joining member. 

Member LEAVE: When a member voluntarily leaves the multicast service or when 
deputy controller expels a member, the Area Controller changes AK’ to ensure 
forward confidentiality and distributes it to all the members securely. 

Member TRANSFER: The member transfer is considered as member leaving one area 
and joining another area. This is achieved by sending leave message to its local area 
controller. The data transmission stops and the local area controller updates the area 
key for the remaining members and securely distributes it. Now the data transmission 
resumes. Meanwhile the transferring member informs the other area that it wishes to 
join. The area controller stops data transmission, verifies the transferring member 
authentication and if approved it generates new area key and distributes it to the 
present members and the transferring member. 

Member JOIN Algorithm 

Let A 3 = {m 7 , m g , m 9 } and member m 10 joins A 3 
Step 1: m 10 sends JOIN message to AC 3 

Step 2: AC 3 verifies with ACL 3 and if approves go to step 3 else do not authenticate 

m ,o 

Step 3: AC 3 sends an approval message to m 10 and generates new area Key AK 3 ’ 

Step 4: AC 3 stops data transmission 

Step 5: AC 3 distributes area key AK 3 ’ as follows: 

AC 3 — » m 7 , m g , m 7 : [AK 3 ’{ AK, (multicast) 

AC 3 — » m 10 : [AK 3 ’j k 10 (unicast) 

Step 6: AC 3 resumes data transmission as follows: 

MMS — > ACj, AC,, AC 3 : [Data] DK (multicast) 

AC 3 — > A 3 : [Data] AK 3 ’ (multicast) 

And now A 3 = {m 7 , m g , m 9 , m 10 ]. 

For member JOIN, it is obvious that the encryption complexity for key distribution is 
0(2) and the rekey message distribution complexity is also 0(2). 



Member LEAVE Algorithm 

Let A, = {m 7 , m g , m 9 , m 10 ] and member m 10 LEAVE A 3 




Design of Secure Multicast Models for Mobile Services 



167 



Step 1: a) m 10 sends LEAVE message to AC 3 or b) AC, sends EXPEL message to m 10 if 
it is not authorized for the new session. 

Step 2: AC 3 generates new area Key AK 3 ’ 

Step 3: AC 3 stops data transmission 

Step 4: AC 3 distributes area key AK 3 ’ as follows: 

AC 3 — > m 7 : [AK 3 ’] k 7 (unicast) 

AC 3 — > m g : [AK 3 ’] k 8 (unicast) 

AC 3 — » m 9 : [AK 3 ’] k 9 (unicast) 

Step 5: AC 3 resumes data transmission as follows: 

MMS — > ACj, AC,, AC 3 : [Data] DK (multicast) 

AC 3 — > A 3 : [Data] AK 3 ’ (multicast) 
and now A, = {m 7 , m s , m 9 }. 

For member LEAVE, it is obvious that the encryption complexity for key distribution 
is 0 (N a -1) and the rekey message distribution complexity is also 0(N A -1). 

Member TRANSFER Algorithm 

Let member m 10 TRANSFER from A 3 to A r 

Member transfer is considered as member leave from A 3 and member join in A 2 . 

A 3 = [m 7 , m 8 , m 9 , m 10 ] 

Step 1 : m 10 sends LEAVE message to AC 3 

Step 2: AC 3 sends an approval message to m 10 and generates new area Key AK 3 ’ 

Step 3: AC 3 stops data transmission 

Step 4: AC 3 distributes area key AK 3 ’ as follows: 

AC 3 — » m 7 : [AK 3 ’] k 7 (unicast) 

AC 3 — > m g : [AK 3 ’] k 8 (unicast) 

AC 3 — > ni 7 : [AK 3 ’] k 9 (unicast) 

Step 5: AC 3 resumes data transmission as follows: 

MMS — » ACj, AC,, AC 3 : [Data] DK (multicast) 

AC 3 —> A 3 : [Data] AK 3 ’ (multicast) 
and now A, = {m 7 , m 8 , m 9 }. 

A 2 = [m 4 , m 5 , m 6 ] and member m 10 joins A, 

Step 6: m 10 sends JOIN message to AC, 

Step 7: AC, Verifies with ACL 2 and if approves go to step 8 else do not authenticate 

m ,o 

Step 8: AC, sends an approval message to m 10 and generates new area Key AK,’ 

Step 9: AC, stops data transmission. 

Step 10: AC, distributes area key AK,’ as follows: 

AC, — > m 4 , m 5 , m 6 : [AK,’] AK, (multicast) 

AC, — > m 10 : [AK,’] k 10 (unicast) 

Step 11: AC,resumes data transmission as follows: 

MMS — > ACj, AC,, AC 3 : [Data] DK (multicast) 

AC ,— > A, : [Data] AK, (multicast) 
and now A, = {m 4 , m 5 , m 6 , m 10 }. 

For member transfer, it is obvious that the encryption complexity is 0(N A +1), if there 
are equal number of members in each area and the rekey message distribution 
complexity is also 0(N A +1). 




168 



E. Blessing, R. and R. Uthariaraj, V. 



5.2 Enhanced LeaSel Model 

The LeaSel model was proved to be a highly secure [7], [9], fault tolerant [13] 
multicast model for wired multicast services. It was also proved that it is highly 
efficient [2] in terms of computational complexities. In this model, the group key 
generation and distribution is performed by the leader and it is completely hidden 
from the members of the group. The mathematical and experimental evaluations has 
proved [7], [9] the fact that it is very difficult to break the multicast service. To 
support mobility between areas, the Area controller in addition to access control list it 
also maintains Area Key holders list (AKHL). The AKHL contains those active 
members who are not in its area but present in the multicast service but hold the 
current area key. The AKHL helps unnecessary stoppage of data transmission and 
also reduce the computational complexities and overheads at the area controller while 
transfer. Every member possesses a member area key list (MAKL) and the key ID 
identifies its keys. When a member wants to transfer from one area A, to another area 
A,, it sends a MEMBER_TRANSFER message to both the area A p A r The A, on 
receiving the MEMBER_TRANSFER message checks its AKHL to find whether the 
member possess the valid area key. If it does not possess the valid area key, it 
transmits the recent area key encrypted with the member’s individual key. On the 
other hand, A, updates its AKHL and will not change the area key. 

Member JOIN Algorithm 

Let A 3 = [m 7 , m 8 , m 9 ] and member mi 0 joins A 3 
Step 1: niio sends JOIN message to AC 3 

Step 2: AC 3 verifies with ACL 3 and if approves go to step 3 else do not authenticate 
niio. 

Step 3: AC 3 sends an approval message to mi 0 and triggers KGM module of the area 
leader. 

Step 4: L 3 generates new area Key AK 3 ’ 

Step 5: L 3 stops data transmission 

Step 6: L 3 distributes area key AK 3 ’ as follows: 

L 3 — » m 7 , m 8 , m 9 : | AK 3 ’] AK 3 (multicast) 

L 3 — > mi 0 : [AK 3 ’| ki 0 (unicast) 

Step 7: Data transmission resumes as follows: 

MMS — > ACi, AC 7 , AC 3 : [Data] DK (multicast) 

AC 3 — » L 3 : [Data] DK (unicast) 

L 3 — » A 3 : [Data] AK 3 ’ (multicast) 

And now A 3 = [m 7 , m 8 , m 9 , m 10 }. 

For member JOIN, it is obvious that the encryption complexity for key distribution is 
0(2) and the rekey message distribution complexity is also 0(2). 

Member LEAVE Algorithm 

Let A 3 = [m 7 , m 8 , m 9 , mi 0 ] and member mi 0 LEAVE A 3 

Step 1 : a) mi 0 sends LEAVE message to AC 3 or b) AC 3 sends EXPEL message to mi 0 
if it is not authorized for the new session. 

Step 2: AC 3 forwards it to L 3 and L 3 removes m 10 from AKHL 3 and generates new 
area Key AK 3 ’ 




Design of Secure Multicast Models for Mobile Services 



169 



Step 3: L 3 stops data transmission 

Step 4: L 3 distributes area key AK 3 ’ as follows: 

L 3 — » m 7 : [AK 3 ’] k 7 (unicast) 

L 3 — > m 8 : [AK 3 ’] k 8 (unicast) 

L 3 — > m 9 : [AK 3 ’] k 9 (unicast) 

Step 5: The data transmission resumes as follows: 

MMS — > ACi, AC 7 , AC 3 : [Data] DK (multicast) 

AC 3 — > L 3 : [Data] AK 3 (unicast) 

L 3 — > A 3 :[Data] AK 3 (multicast) 

and now A 3 = [m 7 , m 8 , m 9 }. 

For member LEAVE, it is obvious that the encryption complexity for key distribution 
is 0(N A -1) and the rekey message distribution complexity is also 0(N A -1). 

Member TRANSFER Algorithm 

Consider mi 0 transferring from A 2 to A 3 . 

Step 1: mio sends MEMBER_TRANSFER message to A 2 and A 3 . 

Step 2: A 2 and A 3 forward it to L 2 and L 3 respectively. 

Step 3: L 2 updates its AKHL 2 by including the transferring member into its list. The 
data transmission continues uninterrupted 
AC 2 — > L 2 : [Data] DK (unicast) 

L 2 —> A 2 : [Data] AK 2 (multicast) 

Step 4: L 3 checks AKHL 3 and verifies whether it possess current area key. If No go to 
step 5 else go to step7. 

Step 5: The data transmission stops. 

Step 6: L 3 distributes the current area key to m 10 as follows: 

L 3 — » mi 0 : [AK 3 ] kio (unicast) and go to step 8. 

Step 7: The data is distributed and the member uses the area key available in the 
member area key list (MAKL). 

Step 8: The data transmission resumes 

AC 3 — > L 3 : [Data] AK 3 (unicast) 

L 3 — > A 3 :[Data] AK 3 (multicast) 

If the transferring member possesses the current area key, then there are no 
encryptions for key distribution and there are no key distribution messages. If it is not 
so, then the encryption complexity for key distribution and the key distribution 
complexity is 0(1). 



6 Results and Discussion 

The multicast models for the mobile service are simulated and run with all nodes 
moving at medium speed. The codes are written in JAVA. For different mobile 
multicast events, the number of encryptions for key distribution and the number of 
key distribution messages are experimentally determined. Further, all the algorithms 
are evaluated for different mobile environments with different mobility transfer 
factors. For the experimental setup, it is assumed that the members of the mobile 
multicast service are equally divided into 14 areas each controlled by its area 




170 



E. Blessing, R. and R. Uthariaraj, V. 



controller. In the graphs, the Basic simple model is represented as BS and the 
Enhanced LeaSel model is represented as EN_LEA. 

Experiment 1: A member was allowed to join the area A 1 by sending the JOIN request 
to its area controller AC r For different members in the area, the total number of 
encryptions for distributing the key and the number of rekey distribution messages 
due to member JOIN were determined and its results are shown in Fig. 1(a) and Fig. 
1(b) respectively. 



40 - 
30 
20 
10 



— ■■ — ■ B.Decleene Model 
♦ BS.EN LEA 



200 400 600 800 1000 

No. of Members in Area 



— *■ — ■ B.Decleene Model 
♦ BS,EN LEA 



40 

3 30 H 

d) 

I 20 
10 
0 



200 400 600 800 1000 



No. of Members in Area 



(a) (b) 

Fig. 1 . Member JOIN a) Number of Encryptions vs. Number of Members b) key distribution 
messages vs. Number of Members 



Experiment 2: From the area A,, a member was allowed to leave the service by 
sending LEAVE request to its area controller. The total number of encryptions for 
distributing the key and the number of rekey distribution messages due to member 
LEAVE were determined and its results are shown in Fig. 2(a) and Fig. 2(b) 
respectively. 

Experiment 3: For different number of members in the mobile service, the member 
transfer algorithm for the models are run and the total number of control messages 
when the mobility transfer factor is set to 10 % and 20% is determined and its results 
are shown in Fig. 3(a) and Fig. 3(b) respectively. 

Experiment 4: For different number of members in the mobile service, the models are 
run and the number of rekey messages distributed when the mobility transfer factor is 
set to 10 % and 30% were determined and its results are shown in Fig. 4(a) and 
Fig. 4(b) respectively. 

Experiment 5: The performance of the mobile service for all the models are evaluated 
by determining the number of times the sender stops and resumes the data 
transmission i.e., interruption of data transmission, for different number of member 
transitions. The results are shown in Fig. 5(a). 

Experiment 6: The multicast models are run with all nodes moving at medium speed 
with 20% mobility transfer factor. Each area is flooded with packets such that the 
congestion level in each area is 10%. For different number of members in the mobile 






Design of Secure Multicast Models for Mobile Services 



171 



service, the time required to successfully transmit 50 packets of data is determined 
and the results are shown in Fig. 5(b). 



— — ■ B.Decleene Model 



— ■« — ■ B.Decleene Model 



BS,EN_LEA 



bs,en_lea 





Fig. 2. Member LEAVE a) Number of Encryptions vs. Number of Members b) Rekey 
distribution messages vs. Number of Members 




Number of Members 




(a) (b) 

Fig. 3. Number of control messages VS Number of members a) For mobility transfer factor = 
10% b) For mobility transfer factor = 20% 



It is obvious from the Fig. 1 , Fig. 2 that the proposed models have less encryption cost 
with reduced overheads. Also it is obvious from Fig. 3 that the Enhanced LeaSel 
model shows better result compared to Basic simple model. The Fig. 4 proves that 
Enhanced LeaSel model is best suited for mobile multicast services whose members 
dynamically move between areas and also suited for mobile service which has large 
number of members. The Fig. 5(a) shows that for Enhanced LeaSel model, there 








172 



E. Blessing, R. and R. Uthariaraj, V. 




Number of Members 



I) 35000 

c3 

| 30000 

S -a 25000 

j? 1 20000 

1 2 'I 15000 

° S ioooo 

■g 5000 

E 

I 0 

o o o o 

O O O O 

^ o 

Number of Members 




(a) (b) 

Fig. 4. Number of rekey messages distributed VS number of members for (a) Mobility transfer 
factor = 10% (b) Mobility transfer factor = 30% 




30000 
| 25000 
| 20000 

e 15000 

0 

1 10000 

S 5000 
co 

0 

Number of Members 



- EN_LEA 



— BS 



50 



100 



150 200 



(a) (b) 

Fig. 5. a) Number of stop and resume messages VS Number of transitions b) Simulation time 
VS Number of members. 



is very less interruption of data transmission. The Fig. 5(b) shows that Basic simple 
model has poor throughput whereas Enhanced LeaSel model has relatively good 
throughput. 



7 Conclusion 

In this paper, two different secure mobile multicast models are designed, simulated, 
tested and analyzed for all the mobile multicast events and it is compared with the 
existing models. It is proved that Enhanced LeaSel model has better performance in 
terms of complexity, overheads and throughput. 








Design of Secure Multicast Models for Mobile Services 



173 



References 



1. Wong, C, Gouda, M, Lam, S.S.: Secure Group Communication Using Key Ggraphs. 
IEEE/ ACM Transaction on Networking, Vol. 8. no.l, (2000) 16-30 

2. Elijah Blessing, R, Rhymend Uthariaraj, V.: Evaluation and Analysis of Computational 
Complexity for Secure Multicast Models. In: Kumar. V et al. (eds.): Computational 
Science and Its Applications. Lecture Notes in Computer Science, Vol. 2668. Springer- 
Verlag, Berlin Heidelberg New York (2003 ) 684-694 

3. Gong, G, Shacham, N.: Multicast Security and Its Extension to Mobile Environment. 
Wireless Networks, (1995) 281-295 

4. Decleene, D, Dondeti, L et al.: Secure Group Communications for Wireless Networks. 
Proceedings of INFOCOM (2001) 

5. Bhargava, B et al.: Fault Tolerant Authentication and Group Key Management in Mobile 
Computing. CERIAS Technical Report (2000) 

6. Berkovits, S.: How to Broadcast a Secret. In: Davis. D.W (ed): Advances in Cryptology. 
Lecture Notes in Computer Science, Vol. 547. Springer- Verlag, Berlin Heidelberg New 
York (1991) 535-541 

7. Elijah Blessing, R, Rhymend Uthariaraj, V.: Secure and Efficient Scalable Multicast 
Model for Online Network Games. Proceedings of International Conference ADCOG 
(2003) 8-15 

8. Fiat, A, Naor, M.: Broadcast Encryption. In: Stinson. D.R (ed): Advances in Cryptology. 
Lecture Notes in Computer Science, Vol. 773. Springer- Verlag, Berlin Heidelberg New 
York (1994) 480-491 

9. Elijah Blessing, R, Rhymend Uthariaraj, V.: LeaSel: An Efficient Key Management Model 
for Scalable Secure Multicast System. Proceedings of International Conference ICORD 
( 2002 ) 

10. Wallner, D.M, Harder, E.J., Agee. R.C.: Key Management for Multicast: Issues and 
Architectures. RFC 2627 (July 1997) 

11. Mittra, S.: Iolus: A framework for Scalable Secure Multicasting. Proceedings of ACM 
SIGCOMM (1997) 277-288 

12. Harney, H, Muckenhim, C.: Group Key Management Protocol (GKMP) Architecture. RFC 
2094 (July 1997) 

13. Elijah Blessing, R, Rhymend Uthariaraj, V.: Fault Tolerant Analysis of Secure Multicast 
Models. Accepted for Presentation in International IEEE Conference ICICS-PCM (2003) 




Some Notes on the Complexity of Protein 
Similarity Search under mRNA Structure 
Constraints 



Dirk Bongartz 

Lehrstuhl fur Informatik I, RWTH Aachen 
Ahornstrafie 55, 52074 Aachen, Germany 

bongartzScs . rwth-aachen . de 



Abstract. In [2], Backofen et al. propose the MRSO problem, that is 
to compute an mRNA sequence of maximal similarity to a given mRNA 
and a given protein, that additionally satisfies some secondary structure 
constraints. The study of this problem is motivated by an application in 
the area of protein engineering. Modeled in a mathematical framework, 
we would like to compute a string s £ {a,b, a, b} 3n which maximizes the 
sum of the values of n functions, which are blockwise applied to triples 
of s, and additionally satisfies some complementary constraints on the 
characters of s given in terms of position pairs. While the decision version 
of this problem is known to be NP-complete (see [2]), we prove here the 
APX-hardness of the general as well as of a restricted version of the prob- 
lem. Moreover, we attack the problem by proposing a 4-approximation 
algorithm. 



1 Introduction 

Before we describe the MRSO problem, which we will consider throughout this 
paper, in a mathematical framework, we first give a rough idea of the biological 
background by which the problem is motivated. 

The fundamental process in molecular biology and although in biology itself 
is the transformation of hereditary information coded in DNA into proteins. 
DNA as well as proteins are long chains of smaller molecular entities, so called 
nucleotides and amino acids, respectively. In nature, we distinguish four different 
types of nucleotides and about twenty amino acids. Thus, we can view these 
molecules as strings over the corresponding alphabet of nucleotides or amino 
acids. Furthermore, certain types of nucleotides can establish bonds to each other 
which results in the possibility of the connection of two DNA single strands to 
one double strand, which is actually the conformation DNA occurs in nature. 
A rather similar molecule, called RNA, also consists of nucleotides but appears 
as a single strand and thus, has the probability to establish bonds between 
nucleotides of the same strand, resulting in the so called secondary structure of 
the RNA. This secondary structure is usually described by a set of index pairs, 
representing the positions of nucleotides in the string establishing a bond. Some 
examples of these secondary structures are visualized in Figure 1. 



P. Van Emde Boas et al. (Eds.): SOFSEM 2004, LNCS 2932, pp. 174-183, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Some Notes on the Complexity of Protein Similarity Search 



175 



Hairpin (a) 



Pseudoknot (b) 




S a = {{ 1 , 16 }, { 2 , 15 }, { 3 , 14 }, { 4 , 13 }, { 5 , 12 }} 



S b = {{ 2 , 15 }, { 3 , 14 }, { 4 , 13 }, { 10 , 24 }, { 11 , 23 }, { 12 , 22 }} 
Fig. 1 . Two types of secondary structure elements occurring in RNA 



The transformation of DNA into proteins is now divided into two subpro- 
cesses. In the first step a copy of the DNA is constructed in a process called 
transcription providing a special kind of RNA molecule, denoted as messenger 
RNA, mRNA for short. In the second step, the translation, this mRNA is trans- 
fered into an amino acid sequence by reading triples of nucleotides in a blockwise 
fashion, each coding for one specific amino acid. This process is universal for all 
living creatures. Biologists found out that according to the secondary structure 
of the mRNA a certain triple of nucleotides might encode for different things. 
In particular, in the investigated case a triple might code for a STOP (termi- 
nating the construction of the amino acid sequence) or, if followed by a special 
secondary structure known as hairpin loop (see Figure 1(a)), for the amino acid 
selenocysteine which enhances the function of the resulting protein [5] . 

Due to the enhancing effect of this mechanism it would be useful to utilize 
this knowledge in the context of protein engineering. Thus we try to determine 
for a given mRNA and a protein, a sequence of nucleotides which has maxi- 
mal similarity to the given mRNA as well as its induced protein sequence is 
maximally similar to the given protein, and additionally obeys some specified 
secondary structure constrains. 

In [2] Backofen, Narayanaswamy, and Swidan modeled this problem in terms 
of an optimization problem called MRSO (MR.na Structure Optimization [1]) 
whose complexity we will investigate in the sequel of this paper. 

To describe the idea of the problem formulation, let r = rq . . . rs n be a string 
over an alphabet £ and p = p± ... p n a string over an alphabet £' representing 
the given mRNA and the given protein, respectively. Thus, we look for a string 
s = Si . . . S 3 n , which is most similar to r and p as suggested by Figure 2, with the 
additional requirement that s satisfies certain secondary structure constraints. 

In Figure 2 the string p' represents the amino acid sequence inferred from s 
by blockwise translation of triples of nucleotides into one amino acid. By ~ we 
visualize the similarity between the given nucleotides from r and s, (rj ~ Sj ) , 
and the given amino acids from p and the inferred amino acids from //, (pj ~ p'j). 




176 D. Bongartz 



p = Pi Pi ■ ■■ Pn 

l l l 

p' = Pi ... p'i ... p' n 

t t t 

S = S1S2S3 • • • S3i-2S3i-lS3i . . . S3n-2S3n-lS3n 

III III III 



r = nr 2 r 3 . . . r 3 i-2r3i-ir3i 



T3n — 2^*3n — 1^3n 



Fig. 2. Idea of the MRSO-problem. (This picture stems from [2] , except for a slightly 
differing notation) 



We will introduce this model more formally in Section 2, where we also formalize 
the similarity and the structure constraints. 

For the MRSO problem it has been shown in [2] that there exists a linear time 
algorithm if the considered secondary structure corresponds to an outerplanar 
graph. For the general case, an NP-completeness result has been obtained for 
the decision version of the problem, and a 2-approximation algorithm has been 
proposed. 

After presenting some preliminaries in Section 2, we will improve the hard- 
ness results of [2] by giving two APX-hardness proofs including explicit lower 
bounds. Moreover, in Section 4, we present a 4-approximation algorithm based 
on a greedy approach. Finally, in Section 5, we propose to attack the problem 
applying the concept of parameterized complexity. 

2 Preliminaries 

Let us now consider Figure 2 again. Let r = r\ . . .rs n be a string over an al- 
phabet £ = {a, &, a, b} of size four, where a and a as well as b and b denote 
complementary nucleotides, between which bonds can be established according 
to a certain secondary structure. 1 Any other pairing is not allowed. Moreover 
it is usually assumed that bonds can only occur between complementary nu- 
cleotides which are at least 4 positions apart. Let p = pi . . .p n be a string over 
an alphabet £' representing a protein. 2 

To fix the similarities Sj ~ rj and pj ~ pb, we can provide a set of functions 
fi, for 1 < i < n, which assign a value to the triple (s3j_2S3,;_iS3j) according to 
the similarity of this triple to {r^i- 2 , r^i) and pi. 

Our goal is to compute a string s such that the sum of these function values 
is maximized under the constraints given by the secondary structure. For this 
problem, it should be clear that it is sufficient to know the set of functions, 
knowing the strings r and p is not necessary. 

1 In a biological setting it is more usual to use the alphabet {A, C, G, U f denoting the 
four types of nucleotides occurring in RNA (adenine, cytosine, guanine, and uracil). 
Bonds may establish between A and U, and C and G. 

2 In biology one considers an alphabet of size 20 corresponding to the standard amino 
acids. 




Some Notes on the Complexity of Protein Similarity Search 177 

Moreover, instead of providing a set of pairs of positions we can represent 
the secondary structure constraints in terms of an undirected graph. 

Thus our constraints can be given in terms of the following structure graph. 

Definition 1. (similar to [2]) Let S = {{i,j} | 1 < i < j < 3 n} represent 
a secondary structure of an mRNA of length 3 n. Then the structure graph G = 
(V, E) is defined by V := {1, . . . , 3n}, E := S. 

In the sequel, we will always think of structure graphs in a way such that 
the vertices are laid out on a line in increasing order. In Figure 3 we depict the 
structure graphs corresponding to the secondary structures given in Figure 1 
represented in this style. 



Hairpin 




1 2 3 4 5 12 13 14 15 16 



Pseudoknot 




Fig. 3. Structure graphs for hairpin and pseudoknot secondary structures, where ver- 
tices are laid out on a line. 



Next we give the formal definition of the MRSO problem in terms of an 
optimization problem. 

Definition 2. MRSO denotes the following maximization problem. 

Input: A structure graph G = (V,E), with V = {l,...,3n}, and n functions 
fi : E 3 —1 Q-° for 1 < i < n, where E = {a, 6, a, 6}. 

Constraint: For all inputs x = (G,fi, . . . ,f n ), the set of feasible solutions is 
defined by: 

Ai(x) = {s = s i . . . S 3 n £ E 3n | {i,j} £ E implies s, = sj}, 

where a = a and b = b. (We call this the complementary constraint. ) 

Costs: For all inputs x = ( G , fi , . . . , /„) and feasible solutions s = s i . . . S 3 n £ 
M.(x), 

n 

COSt^SjX') fi (^3i— 2 ■> ^3i— 1 •> ^3i) • 

i = 1 



Goal: Maximization. 




178 D. Bongartz 



Note that a feasible solution of MRSO may also be viewed as an assignment 
of labels from £ to the vertices from G, such that the complementary constraint 
is satisfied. 

Sometimes we will prefer a representation where we focus on the amino acid 
level instead of the nucleotide level. Therefore we introduce the definition of the 
so called implied structure graph. 

Definition 3. (similar to [2]) Let S = {{*,}} | 1 < i < j < 3?r} represent 
a secondary structure of an mRNA of length 3 n. Then the implied structure 
graph Gimpi = (V, E) is defined by 

V:={1 

E := {{a:, y} \ there exists a pair {i,j} € S , 

such that i € {3a: — 2, 3a: — 1, 3a:} and j 6 {3 y — 2,3 y — 1, 3a/}}. 

Thus the implied structure graph may also be thought of as the structure 
graph where we blockwise join three consecutive vertices to one supervertex and 
a feasible solution of the MRSO problem will assign a triple from £ 3 to each 
vertex of the implied structure graph. 

Note that the implied structure graph as defined here does not mimic all im- 
plications of the structure graph exactly. There might be multiple (but at most 
three) complementary constraints between two vertices from G; mp i, and further- 
more the information which pairs according to the structure graph that will be, 
is not encoded in the implied structure graph. In spite of these shortcomings the 
notation of the implied structure graph will turn out to be sufficient and helpful 
for some observations in the rest of this paper. 

In the next section we will prove that MRSO is APX-hard, i.e. there exists a 
constant c such that it is NP-lrard to approximate the problem by a factor smaller 
than c. For a general overview on the concept of approximation algorithms we 
refer the reader to [6]. 

3 APX-Hardness and Lower Bounds for MRSO 

First, we consider the MRSO in its general form, i.e. as given by Definition 2. 
Here we consider the case that vertices in the structure graph may have arbitrary 
degree. This does not really correlate to the biological motivation, since usually 
each nucleotide can pair with at most another one, hence implying a maximum 
degree of one. This restricted version of the problem will be considered later in 
this paper. 

We will show now that MRSO is a generalization of the MaxE3SAT problem 
and thus, it is APX-hard and not approximable within a factor of | — e, for ar- 
bitrary small e > 0. Therefore, we first define the MaxE3SAT problem formally. 



Definition 4. MaxE3SAT is defined as the following maximization problem. 
Given a boolean formula <P = C\ A • • • A G m in 3-CNF over a set of variables 




Some Notes on the Complexity of Protein Similarity Search 



179 



X = {xi , . . . , x n }, where each clause Ci = c ^ V Cj 2 V Cj 3 consists of exactly three 
literals. 

Compute an assignment (p : X — > {0, 1} such that the number of satisfied 
clauses is maximized. 



Theorem 1 . The MRSO is a generalization of the MaxESSAT problem. 

Proof. The idea to prove this result is to use the characters of the searched 
string s of MRSO as boolean values and the structure graph to encode the valid 
assignments. 

Let T> = Ci A • • • A C m be a boolean formula in 3-CNF over variables 
{x\, . .. ,x n }, where Ci = c ^ V Ci 2 V Cj 3 . We assume that each variable occurs 
at least once positively and at least once negatively in this formula. Otherwise 
it could be directly simplified by removing all clauses in which these variables 
occur, since we clearly are able to satisfy those clauses by an appropriate assign- 
ment of the variables. The corresponding input for the MRSO problem is given 

by 



— G := (V, E ), where V := {1, . . . , 3?n}, and 

E '■= {{bj} I there exist integers g,h,k,l, such that 

* = 3(g - 1) + h,j = 3 (k - 1) + l, and c gh = cj^}. 

Thus, the vertices in G correspond to the occurrences of literals in the clauses 
of <P and the edges in G correspond to pairs of literals where the same variable 
occurs positively and negatively. 

— For all i, 1 < i < to, we define the function fi by the same function /: 



f{tl,t2, tz) 



0 if (ti,t 2 ,ts) £ {a, a} 3 
1 — rii=i(l — V{U)) otherwise 



where ip(a) = 1 and tp(a) = 0. Thus, / computes for each clause its boolean 
evaluation given an assignment of the variables in which we interpret a as 
the boolean value 1 and a as the boolean value 0. 



By the definition of G the above reduction guarantees that all assignments cor- 
responding to a solution of the MRSO are well defined, i.e. that no variable is 
assigned to different boolean values in different clauses. Moreover the function / 
restricts useful solutions of the MRSO to those which only include characters a 
and o, and additionally counts the number of satisfied clauses by increasing the 
cost of the solution for the MRSO. Hence, a feasible solution s = Si . . . s^ n with 
cost k for the MRSO directly corresponds to an assignment for <P that satisfies 
k clauses. □ 



By applying the inapproximability result for MaxE3SAT from [4] we can 
directly infer the following from Theorem 1. 

3 This is a convenient but not necessary restriction, since we could also extend the 
interpretation to the whole alphabet E by defining ip(b) = 1 and ip(b) = 0. 




